on non-promising spans which are
unlikely to participate in the final derivation that is chosen. This motivates
agenda-based parsing, in which derivations that are deemed more promising
by the model are built first.
Learner. In learning, we are given
examples of utterance-context-response
triples (x, c, y). There are two aspects of
learning: inducing the grammar rules
and estimating the model parameters. It
is important to remember that practical
semantic parsers do not do everything
from scratch, and often the hard-coded
grammar rules are as important as the
training examples. First, some lexical rules
that map named entities (for example,
[paris ⇒ ParisFrance]), dates, and
numbers are generally assumed to be
37 though we need not assume
these rules are perfect.
21 These rules are
also often represented implicitly.
How the rest of the grammar is
handled varies across approaches. In
CCG-style approach, inducing lexical
rules is an important part of learning.
In Zettlemoyer and Collins,
37 a procedure called GENLEX is used to generate
candidate lexical rules from a utterance-logical form pair (x, z). A more generic
induction algorithm based on higher-order unification does not require any
19 Wong and Mooney33
use machine translation ideas to induce
a synchronous grammar (which can
also be used to generate utterances from
logical forms). However, all these lexicon induction methods require annotated logical forms z. In approaches
that learn from denotations y,
4, 21 an initial crude grammar is used to generate
candidate logical forms, and rest of the
work is done by the features.
As we discussed earlier, parameter
estimation can be performed by SGD
on the log-likelihood; similar objectives based on max-margin are also
22 It can be helpful to also
add an L1 regularization term λ q
which encourages feature weights to
be zero, which produces a more compact model that generalizes better.
In addition, one can use AdaGrad,
which maintains a separate step size
for each feature. This can improve stability and convergence.
Datasets and Results
In a strong sense, datasets are the
main driver of progress for statistical
approaches. We will now survey some of
the existing datasets, describe their prop-
erties, and discuss the state of the art.
The Geo880 dataset36 drove nearly a
decade of semantic parsing research.
This dataset consists of 880 questions
and database queries about U.S. geography (for example, “what is the highest point in the largest state?”). The
utterances are compositional, but the
language is clean and the domain is
limited. On this dataset, learning from
logical forms18 and answers21 both
achieve around 90% accuracy.
The ATIS- 3 dataset38 consists of 5418
utterances paired with logical forms
(for example, “show me information on
american airlines from fort worth texas
to philadelphia”). The utterances contain
more disfluencies and flexible word
order compared with Geo880, but
they are logically simpler. As a result,
slot filling methods have been a successful paradigm in the spoken language understanding community for
this domain since the 1990s. The best
reported result is based on semantic
parsing and obtains 84.6% accuracy.
The Regexp824 dataset17 consists
of 824 natural language and regular
expression pairs (for example, “three
letter word starting with ‘X’”). The main
challenge here is that there are many
logically equivalent regular expressions, some aligning better to the natural language than others. Kushman
and Barzilay17 uses semantic unification to test for logical form equivalence
and obtains 65.6% accuracy.
The Free917 dataset11 consists of 917
examples of question-logical form pairs
that can be answered via Freebase, for
example, “how many works did mozart
dedicate to joseph haydn?” The questions are logically less complex than
those in the semantic parsing datasets
earlier, but introduces the new challenge of scaling up to many more predicates (but is in practice manageable by
assuming perfect named entity resolution and leveraging the strong type
constraints in Freebase). The state-of-the-art accuracy is 68% accuracy.
WebQuestions4 is another dataset
on Freebase consisting of 5810 ques-
tion–answer pairs (no logical forms)
such as “what do australians call their
money?” Like Free917, the questions
are not very compositional, but unlike
Free917, they are real questions asked
by people on the Web independent
from Freebase, so they are more real-
istic and more varied. Because the
answers are required to come from a
single Freebase page, a noticeable frac-
tion of the answers are imperfect. The
current state of the art is 52.5%.
The goal of WikiTableQuestions26 is
to extend question answering beyond
Freebase to HTML tables on Wikipedia,
which are semi-structured. The dataset
consists of 22033 question-table-answer
triples (for example, “how many runners
took 2 minutes at the most to run 1500
meters?”), where each question can be
answered by aggregating information
across the table. At test time, we are given
new tables, so methods must learn how
to generalize to new predicates. The
result on this new dataset is 37.1%.
Wang et al.
30 proposed a new recipe
for quickly using crowdsourcing to
generate new compositional semantic
parsing datasets consisting of question–
logical form pairs. Using this recipe,
they created eight new datasets in small
domains consisting of 12,602 total ques-
tion-answer pairs, and achieved an aver-
age accuracy across datasets of 58.8%.
Chen and Mooney12 introduced a dataset of 706 navigation instructions (for
example, “facing the lamp go until you
reach a chair”) in a simple grid world.
Each instruction sequence contains multiple sentences with various imperative
and context-dependent constructions
not found in previous datasets. Artzi and
Zettlemoyer2 obtained 65.3% sentence-level accuracy.
We have presented a semantic parsing
framework for the problem of natural
language understanding. Going forward, the two big questions are: How
to represent the semantics of language
and what supervision to use to learn
the semantics from.
Alternative semantic representations.
One of the main difficulties with semantic parsing is the divergence between the
structure of the natural language and
the logical forms—purely compositional
semantics will not work. This has led to
some efforts to introduce an intermediate layer between utterances and logical
forms. One idea is to use general paraphrasing models to map input utterances
to the “canonical utterances” of logical
5, 30 This reduces semantic parsing