Arabic, the fifth most popular language in the world, is spoken by more than
284 MILLION PEOPLE IN SOME 22 COUNTRIES, yet the Arabic Web is
still in its infancy, constituting less than 1% of total Web content.
While many research findings on Web
searching are available, little has
been done on the theoretical and
empirical aspects of non-English
Web searching. Here, I review Web-search engines in a multilingual world and describe a
framework that tries to address these issues. Experimental studies of three prototype Web search portals—in Chinese, Spanish, and Arabic—reveal how to
best support non-English Web searching.
English has been the dominant language for information seeking on the Web. But this is not the case
for many non-English-speaking users who rely on
their native languages to search and browse the Web.
The process of information seeking consists of various
stages of problem identification, definition, resolution, and solution presentation [ 12]. Two major
information-seeking activities are searching and
browsing. In searching, users first decompose their
goal into smaller problems, then formulate keyword
queries, and finally evaluate the results through serial
search or systematic sampling. In browsing, users first
transform their general information needs into a
problem, then explore the Web content and hyperlinks through such browse-support tools as automatic
summarization, clustering, visualization, and Web
directories, ultimately evaluating the results by scanning through them.
Techniques proposed to support Web searching
and browsing include meta-searching and Web-page
preview and overview. Because different search
engines employ different methods for page collecting,
indexing, and ranking, they may include systematic
bias in their search results [ 10]. Meta-searching is a
promising method for alleviating this problem [ 4]. By
sending queries to multiple search engines and collat-
ing the set of top-ranked results from each engine,
meta-searching can greatly reduce bias in search
results and improve coverage. In addition, post-retrieval analysis provides added value to results
returned by search engines. Text-categorization techniques help filter Web-page content and provide previews of individual Web pages in the form of
summaries. Document-categorization techniques
help group Web pages, and document visualization
techniques help amplify human cognition in browsing Internet search results. Though used in some
search engines, including excite.com and
vivisimo.com, meta-searching and information previews and overviews are rarely applied in non-English
search engines.
Web searching in a multilingual world is characterized by cross-region and cross-country use of a language, producing regional effects in Web-site design
and functionality. For example, Spanish is widely
used in Europe, North America, and South America.
Arabic is the primary language in the Middle East and
North Africa. Chinese is the primary language in
mainland China, Hong Kong, and Taiwan. The users
of the Fast search engine ( www.fastsearch.com),
mostly European, input queries more frequently than
Excite search-engine users, who focus more on
e-commerce topics [ 11]. These results suggest
regional differences on the Web.
SEARCH ENGINES
Several major search engines provide search services
to non-English-speaking users. Having more than
160 local domains, Google allows users to restrict
search results to pages in 117 languages, providing
translation services between English and eight European languages (Dutch, French, German, Greek,
Italian, Portuguese, Russian, and Spanish), three
Oriental languages (Chinese, simplified and traditional, Korean, and Japanese), and Arabic. AltaVis-