so on. For purpose-built languages, a deep connection to a task and the user community for that task is often worth more than clever design or elegant syntax.

mutation and hybridization

Mutation, some accidental and some intentional, often plays a critical role in the development of purpose-built systems languages. One common form of mutation involves adding a subset of the syntax of one language (for example, expressions or regular expressions) to another language. This type of mutation can be implemented using a preprocessor that converts one high-level form to another or intermingles preprocessed syntax with the target syntax of a destination language. Mutations may diverge far enough that a new hybrid language is formed. The parser tools yacc and bison are the most well-known examples of a complete hybrid language: a grammar is declared as a set of parsing rules intermingled with C code that is executed in response to the rules; the utilities then emit a finished C program that includes the rule code and the code to execute a parsing state-machine on the grammar.

Another example of this type of mutation in early Unix was the Ratfor (Rational Fortran) preprocessor developed by Kernighan. Ratfor permitted the author to write Fortran code with C expressions and logical blocks, and the result was translated into Fortran syntax with line numbers and goto statements, as shown in Figure 6.

An even stranger mutant language was a hybrid of C and Algol syntax developed using the C preprocessor and used in the code for, what else, adb. Apparently, Steve Bourne, the author of the Algol-like Unix sh syntax, was determined that some of Algol’s genome would carry on in the species. Some sample code is shown in Figure 7.

Alas, a later version of the code was run through the preprocessor and then checked in so as to ease maintenance. Many future languages have included more clearly designed crossbreeding to ease the transition from one environment to another. Following the widespread adoption of C, its expression syntax found its way into an incredible number of new lan-

take an address,
dump out its
content, find the
next address,
follow it to the next
location of interest,
dump out its
content, and so on.
for purpose-built
languages, a deep
connection to a task
is often worth more
than clever design
or elegant syntax.

guages, little and big, including Awk, C++, Java, JavaScript, D, Ruby, and many others. Similarly, following the success of Perl, many other scripting languages adopted its useful extensions to regular expression syntax as a new canonical form. Core concepts such as expression syntax often form the bulk of a small language, and borrowing from a well-established model permits rapid language implementation and rapid adoption by users.

Symbiosis

In the development of a larger software system, little languages often live in symbiotic partnership with the mainstream development language or with the software system itself. The adb macro language described earlier would likely not have survived outside of the source-code base of its Unix parent. The macro language of your favorite spreadsheet is another example: it exists to provide a convenient way to manipulate the user-visible abstractions of the containing software application.

In the operating-system world, my favorite little-known example of symbiosis is the union of Forth and SPARC assembly language created at Sun as part of the work on the Open-Boot firmware. The idea was to create a small interpreter used as the boot environment on SPARC workstations. Forth was chosen for the boot and hardware bring-up environment for new hardware because the language kernel was tiny and could be brought up immediately on a new processor and platform. Then, using the Forth dictionaries, new commands could be defined on the fly in the interpreter for debugging. Since Forth permits its dictionaries to override the definition of words (tokens) in the interpreter, someone developed the creative idea of using the interpreter as a macro assembler for the hardware. A set of dictionaries was created that redefined each of the opcodes in SPARC (“ld,” “move,” “add,” and so on) with Forth code that would compute the binary representation of the assembled instructions and store them into memory. Therefore, entire low-level functions could be written in what appeared to be assembly language, prefixed with Forth headers, and

40 communicAtionS of the Acm | APriL 2009 | voL. 52 | no. 4

References:

Archives