PPVT: A Tool to Visualize Predictive Parsing
There also exists an alternate approach where a compiler-compiler is used to create a visual compiler for a given context-free
grammar. Visual compilers for different languages can be generated by providing different grammars to the compiler-compiler
as input. ANTLR [ 4] and VOCOCO [ 15] can be used to generate
LL( 1) parsers. GYACC [ 11] and Visual YACC [ 20] are used in
conjunction with YACC, and generate LALR parsers. CUPV [ 9]
is another tool that generates visual compilers with LALR parsers. LISA [ 12] can be used to generate several types of LL(k) and
LR(k) parsers. ANTLR and LISA are most versatile among these
tools, and can be used as comprehensive teaching aids.
A parsing algorithm visualizer is a more specialized tool. It
graphically visualizes one or more parsing algorithms. It can be
used to study the parsing process in more detail. A parsing algorithm visualizer can be used to study the construction and
the working of a parser. This makes parsing algorithm visualizers quite popular among faculty as well as students. Blythe et
al. [ 5] developed tools named LLparse and LRparse to visualize
LL( 1) and LR( 1) parsing algorithms, respectively. Khuri and Sugono [ 10] developed a tool to visualize LL( 1) and SLR( 1) parsing algorithms. JFLAP [ 8, 16] is undoubtedly the most sophisticated parsing algorithm visualizer to have been developed so
far. JFLAP can visualize backtracking-based recursive-descent,
LL( 1), SLR( 1) and CYK parsing algorithms. Alternatively, Compiler Construction Toolkit [ 7] can be used to visualize LL( 1),
LR(0) and SLR( 1) parsing algorithms. Some work has been done
to visualize recovery from syntax errors too [ 18]. Positive student feedback about these tools has been reported in literature.
In this article, we carry forward the work on parsing algorithm
visualizers. PPVT is fast and efficient. PPVT is highly scalable, i.e.,
it works well for large grammars
and input strings. The main advantage of PPVT over other parsing algorithm visualizers is in the
fact that it presents the output in
a textual format like that used in
textbooks. PPVT stepwise illustrates the construction of a predictive parser for a given grammar
and then the use of that parser to parse a given input string. Care
has been taken to show the calculations involved in each step.
THE PREDICTIVE PARSING
PPVT receives a context-free grammar from the user in a text
file. Once PPV T is run, the user is prompted to enter the name
of the grammar file. It is assumed that the grammar is LL( 1)
and there is no ambiguity, left-recursion or scope for left-fac-
toring in the grammar. A nonterminal is represented by a sin-
gle uppercase letter in the grammar, while a single lowercase
letter, digit, or special symbol represents a terminal. The ‘~’
character is used to represent ε. The ‘$’ character is used to
denote the stack start symbol as well as the end of string. The
head and the body of a production rule are separated by a ‘->’
mark. Each production rule is written in a new line. Two pro-
duction rules of the form A → α | β are written separately in
two different lines as A → α and A → β. The nonterminal at the
head of the first production rule is treated as the start symbol
of the grammar.
On receiving a grammar, PPVT starts with calculating the
FIRST sets for all terminals and nonterminals in the grammar.
The algorithm to calculate the FIRST sets is given in literature
[ 1]. However, the algorithm is optimized for calculation by
hand. To overcome this problem, the algorithm has been implemented in PPVT in two steps. In the first step, topological
sorting is used to determine the order in which the sets can be
calculated. In the second step, the sets are calculated. PPVT
next calculates the FOLLOW sets for all the nonterminals in
the grammar, again using a two-step method. Using the FIRST
and FOLLOW sets, PPVT now constructs a predictive parsing table for the given grammar. In the parsing table, there is a
row corresponding to each nonterminal and one column corresponding to each terminal in the grammar. If at any point in
the process of parsing, there is a nonterminal A at the top of the
stack and the current input symbol is a, then the production
rule in the cell [A, a] will be used. A cell in the parsing table
can contain at most one production rule. An attempt to fetch a
production rule from an empty cell in the parsing table denotes
a syntax error. All calculations done by PPVT till this point are
independent of the input string and the output is saved in a text
file. If file.txt is the name of the grammar file, then the output is
saved in file_Output.txt (Figure 1).
The user is now prompted to enter the string to be parsed.
PPVT then carries out a table-driven parsing of that string following the predictive parsing algorithm and using the parsing
table. In each step, PPVT prints the part of the string already
matched, the contents of the parsing stack and the input buffer, and the production rule used in that step. The result of the
parsing process—an acceptance or a syntax error—is printed at
the end. PPVT next prints a parse tree which visually expresses
the derivation of the given string following the given grammar.
Top-down parsers, including predictive parsers, are known
to emulate the leftmost derivation to derive a string from the
grammar start symbol. So, PPV T at last prints the leftmost derivation of the given string. These outputs based on the input
string are appended in file_Output.txt (Figure 1). PPVT has
been implemented in C++.
Figure 1: Input and output of PPVT.