so on. For purpose-built languages,
a deep connection to a task and the
user community for that task is often
worth more than clever design or elegant syntax.
mutation and hybridization
Mutation, some accidental and some
intentional, often plays a critical role
in the development of purpose-built
systems languages. One common
form of mutation involves adding a
subset of the syntax of one language
(for example, expressions or regular
expressions) to another language.
This type of mutation can be implemented using a preprocessor that
converts one high-level form to another or intermingles preprocessed
syntax with the target syntax of a destination language. Mutations may
diverge far enough that a new hybrid
language is formed. The parser tools
yacc and bison are the most well-known examples of a complete hybrid
language: a grammar is declared as a
set of parsing rules intermingled with
C code that is executed in response to
the rules; the utilities then emit a finished C program that includes the rule
code and the code to execute a parsing
state-machine on the grammar.
Another example of this type of
mutation in early Unix was the Ratfor
(Rational Fortran) preprocessor developed by Kernighan. Ratfor permitted
the author to write Fortran code with
C expressions and logical blocks, and
the result was translated into Fortran
syntax with line numbers and goto
statements, as shown in Figure 6.
An even stranger mutant language
was a hybrid of C and Algol syntax developed using the C preprocessor and
used in the code for, what else, adb.
Apparently, Steve Bourne, the author
of the Algol-like Unix sh syntax, was determined that some of Algol’s genome
would carry on in the species. Some
sample code is shown in Figure 7.
Alas, a later version of the code was
run through the preprocessor and
then checked in so as to ease maintenance. Many future languages have
included more clearly designed crossbreeding to ease the transition from
one environment to another. Following the widespread adoption of C, its
expression syntax found its way into
an incredible number of new lan-
take an address,
dump out its
content, find the
next address,
follow it to the next
location of interest,
dump out its
content, and so on.
for purpose-built
languages, a deep
connection to a task
is often worth more
than clever design
or elegant syntax.
guages, little and big, including Awk,
C++, Java, JavaScript, D, Ruby, and
many others. Similarly, following the
success of Perl, many other scripting
languages adopted its useful extensions to regular expression syntax as
a new canonical form. Core concepts
such as expression syntax often form
the bulk of a small language, and borrowing from a well-established model
permits rapid language implementation and rapid adoption by users.
Symbiosis
In the development of a larger software system, little languages often
live in symbiotic partnership with the
mainstream development language
or with the software system itself. The
adb macro language described earlier
would likely not have survived outside
of the source-code base of its Unix
parent. The macro language of your
favorite spreadsheet is another example: it exists to provide a convenient
way to manipulate the user-visible abstractions of the containing software
application.
In the operating-system world,
my favorite little-known example of
symbiosis is the union of Forth and
SPARC assembly language created at
Sun as part of the work on the Open-Boot firmware. The idea was to create
a small interpreter used as the boot
environment on SPARC workstations.
Forth was chosen for the boot and
hardware bring-up environment for
new hardware because the language
kernel was tiny and could be brought
up immediately on a new processor
and platform. Then, using the Forth
dictionaries, new commands could
be defined on the fly in the interpreter
for debugging. Since Forth permits its
dictionaries to override the definition
of words (tokens) in the interpreter,
someone developed the creative idea
of using the interpreter as a macro
assembler for the hardware. A set of
dictionaries was created that redefined each of the opcodes in SPARC
(“ld,” “move,” “add,” and so on) with
Forth code that would compute the
binary representation of the assembled instructions and store them into
memory. Therefore, entire low-level
functions could be written in what
appeared to be assembly language,
prefixed with Forth headers, and
40 communicAtionS of the Acm | APriL 2009 | voL. 52 | no. 4