Natural Language-like Queries
Though keyword querying remains
standard practice on the Web, savvy
users have been typing more detailed
queries for years, and Web search
engines have greatly improved their
ability to handle long queries. Research has shown that people prefer
natural expression of queries over keywords,
3, 30 and Web search engine query length continues to increase. According to Experian Hitwise,
22 a global
online competitive intelligence service, when comparing queries over a
four-week period (August–September
2010) to the same four-week period
in 2009, found that searches of from
five to eight words were up 10%, while
searches of from one to four words
were down 2%. The growth of query
length suggests a desire to express
one’s information needs more thoroughly and may pave the way toward
full-sentence queries. Spoken queries are also likely to be full sentences
when speech recognition is faster and
more accurate.
Longer queries are also being
helped by the online use of colloquial
language. When most content is technical or scientific (as was characteristic of the early Web), there is less
likely an easy-to-find match between
a lay user’s words and the words used
in the informative documents. Popular question-answering sites (such as
Answers.com, Quora, and Yahoo Answers) that store user-generated content bridge colloquial and formal language directly in relevant documents;
for example, if a searcher needs a device to connect both a Wii and a DVD
player to a TV, but does not know what
that device is called, a keyword query
could fail. But the query “how do I
connect wii and dvd to my tv” turns
up a nearly perfect match on a question-answering site, with the solution
being a product called either “video
selector” or “two-way A/V switcher.”
The point is that, though the searcher
lacks the vocabulary to look up what
is needed, the searcher has the same
vocabulary as other people in the
same cognitive situation. The combination of text worded colloquially
and search engines that do a good job
with sentence-length queries helps
resolve the vocabulary problem. Considerable work has focused on how
though
observational
studies have
found that people
often search
collaboratively,
tools have only
recently been
developed to
explicitly support
people searching
together.
to search question-answering sites1, 2;
ranking algorithms that make use of
these mappings will continue to improve results for difficult queries.
Another technical development
that may help users who express
themselves through long queries is
systems that support quasi-natural
language interfaces. The new syntax
is tolerant of variations, relatively robust, and “exhibit[s] slight touches of
natural language flexibility.”
25 These
interfaces are seen in Web search engines supporting various wordings
for certain kinds of questions that
retrieve answers from a database, as
in “Istanbul time,” “What is the time
in Istanbul?,” and “What time is it?
Istanbul.” Blekko allows query modification through a simple slash notation to refine results to predefined categories (such as “istanbul /tech” for
search results about technology and
“istanbul /people” for results labeled
relevant to people).
Miller et al.
23 developed tools for
“sloppy commands,” meaning users
have a lot of flexibility as to how they
express the command, so memorization is not required to make use of
them. The “linguistic command line”
of Enso (later Ubiquity)
8, 33
experimented with leniency in operating
system command lines. The Quicksilver application lookup tool for Apple
operating systems supports a hybrid
command/GUI interface, using continuous feedback to whittle down the
available choices to include what the
user has typed so far that still matches
available commands.
The Wolfram Alpha search engine
provides a range of predefined query
types that mix structured forms with
some flexibility in word order, along
with a knowledge base and computational back-end able to handle certain
combinations of these inputs. For instance, the query “ 2 slices of pizza with
pepperoni” is decomposed into the
base information need (information
about pizza) refined by units (slices),
the quantity (two), and modifications
of the baseline concept (with pepperoni). The result is a table listing calorie
and nutrition information. However,
the system’s interpretive range is limited; the query “recipe for pizza with
pepperoni” returns the same measurement information as “pizza with pep-