CS 642 Term Paper

A survey of distributed languages

Copyright © 1996 Alex Nicolaou. All rights reserved.




Abstract

The huge popularity of the World-Wide Web has provided a new demand for languages that support an elegant model for distributed programming so that the Internet can be fully utilized by programmers. Several new languages have appeared that attempt to provide solutions for the Web.
In this paper these languages are looked at as languages in their own right, rather than as special purpose Internet languages. By considering them as serious entries into the language market more interesting observations are possible, perhaps including an intuition as to which of the new languages will be best accepted and used in the future.
The languages considered are Java, Phantom, and Python.




1. Introduction


This paper's layout may be freely copied, but its content may not.
All of the languages provide some degree of network support, and this is an important factor in considering what languages are well suited to programming "for the internet". However, it is equally important that the languages support general programming tasks as well, since the general features are those that will make it easy or difficult to do the bulk of the programming. All of the languages have some features in common, but each also provides some unique features.
As would perhaps be expected in a world buzzing about "objects", all of the languages are object-oriented. Since that particular term has seen so much use as to be ambiguous, it seems it now needs a (re-)definition for any author who cares to use it. For the purposes of this article, a language shall be deemed object-oriented if it provides sub-typing, polymorphism, inheritance and dynamic binding.

Research in compiler technology for languages such as MIT Scheme and ML may ultimately discover new and more effective techniques for implementing closures and continuations, but these features are still amongst the most expensive aspects of languages which have them.
All the languages are also interpreted, which makes sense considering their intended application. Being interpreted means that they can be largely platform independent, and more easily migrate programs and objects from location to location as desired. Only Java, whose roots indicate a need for a compiled language, has really left the door open to efficient compilation in terms of current compiler technology.
A more surprising choice that all the languages have made is the support of "programming-in-the-large" features, in the form of reasonably well developed module systems that allow the programmer to create a larger project with a coherent organization. Considering the immediate applications anticipated, namely web programming and scripting (in the case of Python) this choice seems a strangely wise one. Perhaps "C" has taught the language community all too well that it is wise to plan ahead for a popular language, lest a poorly planned language be stretched far beyond the original intentions.
In terms of differences, the languages each have unique features that set them apart from each other and from their predecessors. Java is the least novel of the languages. Java's designers were aiming to produce a language that was similar to C++ so that the language would be popular, and their strategy is proving very successful. The main feature that sets Java apart from other languages is the investment in secure transmission of code so that the users of the object programs can execute any program provided on a web page without worrying about trojan horses. The irony is that the very decisions that have limited Java are the very decisions that will make the language successful.
Phantom provides the distributed semantics of Obliq, but is a substantially different language. It implements a mutation of Modula-3, which provides it with a well defined base upon which to build some more exotic features. The main advantage seen in Phantom is the handling of distributed objects, but unfortunately the work on the network support will not be completed, as the author feels that Phantom does not have the momentum to compete with Java.
Python seems to be originally designed to be a scripting language but has all the features that would enable it to compete with Java, including a web browser that can download and execute Python applets. The implementation focuses on the language itself as opposed to issues specific to distributed programming, and provides a rich set of functionality built-in to the language with the intention of making programming with Python easy. Unfortunately, the plethora of built-in operations on objects also gives the language's semantics a "fat" feeling, and places a large burden on even the simplest objects.




2. Classes and Objects


Classes are to functions what control-flow constructs were to goto's: a solution applied universally, even when not suited to the task.
Every one of the languages considered provides classes, inheritance and polymorphism. Java is modeled directly after C++, which helps to make its class system easy to learn. Phantom is also quite conventional, borrowing from and restricting Modula-3's class system. Python is the most peculiar of the three, treating the classes themselves as first class objects that can be modified on the fly.
Classes are supposed to bring three things to the language party: a clear way to define abstract data types, a convenient method to re-use implementations, and subtype polymorphism . In most object oriented languages abstract data types are just particular classes which hide implementation details that the user of the type should not need to see or know about. Inheritance is used to re-use implementation details, and subtype relations are usually also represented by the inheritance relation, allowing the programmer to use a derived class wherever a base class is expected. Each of the languages considered provides a portion of these features.




2.1. Java's class and interface model


Interfaces help to resolve multiple inheritance problems, but class structure also defines subtype relations so there isn't a clean break between sub-typing and implementation inheritance.
Java provides two ways to define a type. The first is by inheritance from an existing class: in this case the new class inherits all the type and implementation details of the base class, and can extend or modify it as appropriate. The second method is by the definition of an interface which defines all of the signatures of methods that must be provided by a class, but gives no implementation details.
Class inheritance is specified in the code by defining the new class as extending the old class. Interface implementation is defined by saying that a class implements a particular interface. A class may only extend one other class, but may implement as many interfaces as it desires. The multiple implementation of interfaces is analogous to multiple inheritance in other object oriented languages, but does not create the confusion of common base classes. When looking for which method to use there is no confusion, since there is only one possible implementation.
This separation simplifies inheritance issues greatly, but it is perhaps unfortunate that the language doesn't go all the way and prohibit classes from introducing new methods inside class bodies. To have done so would have truly separated sub-typing from implementation inheritance. As it currently stands, if one class author doesn't anticipate a need to multiply inherit from two classes, and thus doesn't define interfaces for them, the class user can never use the classes in the way that is desired. Unfortunately, even the class libraries provided with the distribution don't make good use of the interface feature, using interfaces rarely, and preferring to use class inheritance to create both subtypes and implementations, in the same way that is commonly seen in other object oriented languages.

The terminology police might object to the use of the word "method" to refer to any data or method member of a class.
As in C++, Java provides different levels of data hiding on a method-by-method basis. Private methods are the most restricted, and are visible only in the class in which they are defined. Protected methods have slightly more complicated semantics, of which for the moment the most interesting is that they are visible in the base class and in all derived classes. Finally, public methods provide the interface that is available to outside users of the class. Typically a class will declare everything as protected if it wishes to be easily modified or extended, reserving private for items that are likely to be changed in a later revision of the class.



2.2. Phantom's class model


Phantom's origins as a research project are never more clear than when considering the structure of the class system.
Phantom's class system is somewhat disappointing. The main feature lacking is some form of multiple inheritance, which has been entirely omitted, presumably to keep the language simple. However, no functionality is provided to replace multiple inheritance.
Data abstraction is provided in a way not seen in other widely known languages (it most closely resembles MOO, an object oriented language designed for developing multi-user servers). Read and write permission bits are used to control access to data members of a class, and execute permission is used to control visibility of a member function. The language specification does not make it clear what access derived classes are given to their parent classes, so it seems reasonable to assume that there is either no equivalent of Java's protected, or that everything is protected and there is no equivalent of private. Either way is unsatisfactory for larger projects.
Since no multiple inheritance is possible sub-typing is somewhat more limited than in other object oriented languages, but for simple programming tasks single inheritance would suffice. A slightly less object-centric programming paradigm would be used for larger projects, so that one could make good use of Phantom's module features for organization rather than using interfaces and subclassing as one would in Java.




2.3. Python goes first class


First class classes seem to be a potential nightmare for the would-be compiler writer.
Python is the most dynamic of the three class systems considered here. Absolutely no data abstraction is provided by the language, which is sure to be a problem for developing larger pieces of software. In addition, there no types are used in function declarations, giving a very polymorphic style to programs, since any type with the correct members is permitted as a parameter.
What must be re-iterated to be absolutely clear is that classes are first class objects. This means that new functions and member variables can be added to a class at any time, and then new instances of the class created. Existing instances of the class are affected by changes to the class. There are even hooks to allow the class programmer to redefine what happens when a new member is added to a class if desired!

Some would say "dynamically typed".
In a typeless system, subtyping becomes irrelevant but implementation inheritance is still useful and allowed, including multiple inheritance with specific rules to allow the programmer to depend on how the interpreter deals with specific cases.
Although a Smalltalk programmer might feel right at home in the Python environment, programmers used to strongly typed languages should watch their step at first. A great number of pitfalls exist that are not immediately obvious. For example, the behavior of the namespace. A data member hides a method member of the same name. The description of mix of static and dynamic name resolution in the documentation is confusing, although the upshot is that the name resolution basically behaves as one would expect it to.
The bottom line is that some would consider Python to be the most object oriented language recently developed, but working without the crutches of type-safety and static class objects can be harder at first.




3. Other Abstractions


Convenience features are a double edged sword, as often they are only convenient for the original implementor of a language.
Java provides the user with objects, and uses them as the universal solution to all problems. This helps keep the core language small, and allows much of the "standard functionality" to be provided as libraries written in Java. This approach can really simplify the job of creating new Java implementations. Unfortunately, it doesn't make the programmer's job any easier, since the class libraries provided with the language contain sparse documentation that frequently leaves one looking at the code to understand the features.
Phantom takes the middle road, providing some useful built-in features for handling lists an threads. The list features include working with a slice of the list which is some subrange of the list's valid indexes, and an operator for appending lists. Although Scheme-style functions such as map and reduce are not provided in the language they would be trivial to implement since closures are provided. Threads get an above-average treatment in Phantom with synchronization built-in as a keyword for class definitions and a mutex type for writing critical sections supported directly by the language specification.
Python goes the furthest of all, providing genuinely useful abstractions built-in to the language. The first encountered and most commonly seen abstraction is the sequence. Lists, strings, and tuples are all examples of the sequence type. All of these objects have convenient indexing, slicing, and concatenation facilities. The values a variable in a for loop takes on are the successive values of any sequence data type, leading to an intuitive and powerful looping construct.

Dictionaries with keys which are not strings are a relatively recent addition to Python.
Dictionaries are another much-used and convenient workhorse of Python programs. Similar to associative arrays in Awk, dictionaries are used extensively both within the language implementation and by the programmer whenever a lookup table is needed.
A rarely seen extension present in Python is a more "mathematical" set of conditional operators. For example, a < b == c is taken to mean a is less than b, and b is equal to c. This seems strange only to the individual who has worked with computers for so long that it seems more natural to consider c to be a boolean value which is being compared to the result of the comparison a < b. In addition, the comparison operators are well defined for sequence types! The definition is set up so that comparison of strings works exactly as you would expect; that is a sequence s1 is less than a sequence s2 if the first element in which they differ is i and s1[i] < s2[i].
Python also provides closures, and (more recently) the expected functions such as map, reduce, etc. In addition, programs written in Python can create new definitions of functions, classes, or instances on the fly by passing to the interpreter a string which is compiled into the environment, so self-modifying code is easy to write. All in all, Python provides the most complete language support for the mundane tasks, which makes it extremely well suited to getting a prototype working quickly.




4. Modules


It seems ironic that so many little-used languages provide a strong module system while "professional" developers waste hours stretching C and C++ well beyond their limits in large software projects.
The basic goal of a module system is to package functionality in a way that can be easily bought and sold by the various consumers and produces of code. Each lower level library should sell a set of features to a higher level library, which in turn provides a set of features to still higher level libraries, and ultimately applications are little more than a layer of user-interface between the end user and the various engines that make up the product.
At first glance, it would seem that Phantom should have the strongest module system since it borrows from Modula-3, whose very name implies a strong module system. Although Phantom's system is reasonable, it is a watered down version of the original and doesn't provide any significant power not present in Java or Python.
Fundamentally, all three module systems provide the same type of functionality, packaged slightly differently. A module exports an interface (only Java allows a single module to export more than one interface) which is imported by a client who wishes to use the interface. Once imported all the public aspects of the interface are visible to the client. There are some syntactic differences in how modules are accessed, but these are simply the syntactic sugar (or salt) of each individual language. Perhaps the only difference of note is that in Java global functions and data have been eliminated, which means that packages of classes containing static functions are used instead. There is no particular advantage to this other than helping to keep the language small; ultimately this object-centric view could be a problem when global generic functions are desired.




5. Distributed Programming Features


In reality, neither Java nor Python support distributed programming. What they've really done is distribute the interpreter.
There can be no doubt about the fact that the only reason Java is popular as that it is targeted and suited for developing programs that will run in your web browser. Python has been used to build a browser with very similar features, and so could be considered a competitor to Java except for the lack of marketing. Phantom, on the other hand, provides real distributed programming features but has no application to show them off.
The main area where these languages are trying to shine is in the security department. Fear is the one thing that might keep people from using the features: fear that some virus will come into their computer via the internet and wreak havoc on their system. For this reason, the emphasis has been on providing a secure and restricted environment for these web applets to play in so that users can be amused and not be afraid.




5.1. Java's distribution model




In the security category, there can be no question that Java has the head start. When the interpreter loads Java bytecodes over the network they are checked by a verification process to help ensure that they don't perform any illegal actions, and run-time checks such as array-bounds checking are performed as well. The only fly in the ointment that is clear now is the use of the finalize() method. This method is similar to a C++ destructor in that it is called when an object is being destroyed. However, there are two things to worry about in terms of security and finalize(). The first is that this method can resurrect the object by creating a new reference to it, thus perhaps breaking the promise that when you leave a web page the Java objects are killed for you. The second is that there are no guarantees about which thread will invoke the finalize() call, which means potentially an important thread (such as the one running the garbage collection) could invoke and be killed by the code in a finalize() member. It is not clear whether these are real security holes or not as the author has not attempted to verify whether these attacks are possible, but the documentation certainly suggests that they are.
In terms of real distributed programming, the support in Java is disappointing. There is no way to access an object over the network, and no way to send an object across the network. Only code can be easily transmitted under the Java model which means that the data must be communicated by a more conventional socket-style communication.




5.2. Phantom's distributed objects model




Phantom provides the best support for distributed programming of all three languages. The closest equivalent to a pointer is really an interpreter location (specified as an IP address and port) and a 128-bit key which specifies the memory location. The intent is that these global locators are unforgeable, since the 128-bit address space is large and sparsely populated, and any given interpreter transparently translates access to a remote resource by communicating with the remote inrepreter on the programmer's behalf. Although this system is supposed to be secure, it is not clear that it is in fact protected from malicious interpreters, which could create references to real objects with different types than their actual types in order to access data that should be inaccessible in the remote interpreter. However, provided that all of the interpreters are certified, the mechanism is safe and does not allow the application programmer to do inherently insecure operations in the language itself.
Since the interpreter creates closures for all the functions it encounters, logical and safe static scope rules govern what each function has access to. The semantics of distributed objects are borrowed from Obliq. Objects are always passed as references (global locators) to the actual instance, so the programmer can't easily send objects from place to place, although closures are transmitted so that object like entities can be moved around. At the same time as being a feature, one is led to wonder how slow transmitting closures must be, especially if the programmer isn't taking special care to keep the free variables few and far between.




5.3. Python's distributed support




Python is in much the same state as Java, with absolutely no security. It too can transmit code across the network. It would be easy to transmit objects across the network using the pickling facility as well. However, no security provisions yet. Naturally the security is being worked on but it could be a rough game of catch-up to get to where Java already is.



6. Miscellaneous Factors


Although syntactic sugar might be dismissed as unimportant, I've yet to meet a programmer who eats broccoli as his late night snack.
A variety of miscellaneous factors are in the end amongst the most compelling reasons to choose one language over another. Phantom is clearly out of the running, since work on the interpreter has stopped, and both Java and Python are so far ahead in terms of library support and general developer interest.
Sun's choice of a language to resemble C/C++ will certainly help Java's popularity. It feels comfortable - a little strange, perhaps, but basically comfortable. Any proficient C++ programmer is likely to find Java liberating. In particular, the freedom from header files, memory management and from the C preprocessor mean that the language is cleaner and faster to develop in than C++. Sun is also backing their language with interesting programming contests with substantial prizes. The media attention Java is getting will help ensure a high level of demand for Java programmers.
Java's biggest weakness is the lack of well documented and professionally finished libraries. The marketing push is so web-page oriented that the windowing toolkit is terribly hard to use for real application development, and more often than not the would-be Java programmer finds that reading the source is the quickest road to enlightenment.

Perhaps using spaces as a meaningful part of program input didn't go out with FORTRAN.
Python can clearly give Java some stiff competition. Unfortunately the syntax is just foreign enough to be badly disconcerting, especially grouping of statements by indentation level. In fact, with some use it wears well as an intuitive and powerful language.
The complete lack of static type checking means a more Smalltalk-like feel, but it also bodes ill for compilation to static code and for larger projects. Although Java's abstract window toolkit is weak, the stdwin module for Python is not recommended for use by the documentation, since it is only ported to X11 and Macintosh systems and lacks functionality that would be wanted by a serious application effort.




7. Conclusions




Phantom, although an interesting and worthwhile attempt, doesn't give us much to measure it by since there is no complete implementation and there will be no complete implementation. Although it offers the richest set of distributed programming features it isn't clear how viable they are without an implementation to play with.
Java and Python, however, are an evenly matched pair. On the one hand, Java provides things that Python cannot: a bytecode that has great potential to be compiled and optimized, a broad and rapidly growing user base, and a good deal of marketing hype. On the other, Python has a great array of features for scripting type tasks (which is after all the original goal of Python). Python's dynamic typing is either a blessing or a curse depending on the programmer's personal point of view, and similar things can be said about closures and first class classes. Overall it seems that Java is more suited to development of complicated systems, whereas Python's features may speed prototype development but stand in the way of complicated systems.



Alex Nicolaou. Last Modified on May 9th, 1996. [888888]