Arabic, the fifth most popular language in the world, is spoken by more than 284 MILLION PEOPLE IN SOME 22 COUNTRIES, yet the Arabic Web is
still in its infancy, constituting less than 1% of total Web content.
While many research findings on Web searching are available, little has been done on the theoretical and empirical aspects of non-English Web searching. Here, I review Web-search engines in a multilingual world and describe a framework that tries to address these issues. Experimental studies of three prototype Web search portals—in Chinese, Spanish, and Arabic—reveal how to best support non-English Web searching.
English has been the dominant language for information seeking on the Web. But this is not the case for many non-English-speaking users who rely on their native languages to search and browse the Web. The process of information seeking consists of various stages of problem identification, definition, resolution, and solution presentation [ 12]. Two major information-seeking activities are searching and browsing. In searching, users first decompose their goal into smaller problems, then formulate keyword queries, and finally evaluate the results through serial search or systematic sampling. In browsing, users first transform their general information needs into a problem, then explore the Web content and hyperlinks through such browse-support tools as automatic summarization, clustering, visualization, and Web directories, ultimately evaluating the results by scanning through them.
Techniques proposed to support Web searching and browsing include meta-searching and Web-page preview and overview. Because different search engines employ different methods for page collecting, indexing, and ranking, they may include systematic bias in their search results [ 10]. Meta-searching is a promising method for alleviating this problem [ 4]. By sending queries to multiple search engines and collat-
ing the set of top-ranked results from each engine, meta-searching can greatly reduce bias in search results and improve coverage. In addition, post-retrieval analysis provides added value to results returned by search engines. Text-categorization techniques help filter Web-page content and provide previews of individual Web pages in the form of summaries. Document-categorization techniques help group Web pages, and document visualization techniques help amplify human cognition in browsing Internet search results. Though used in some search engines, including excite.com and vivisimo.com, meta-searching and information previews and overviews are rarely applied in non-English search engines.
Web searching in a multilingual world is characterized by cross-region and cross-country use of a language, producing regional effects in Web-site design and functionality. For example, Spanish is widely used in Europe, North America, and South America. Arabic is the primary language in the Middle East and North Africa. Chinese is the primary language in mainland China, Hong Kong, and Taiwan. The users of the Fast search engine ( www.fastsearch.com), mostly European, input queries more frequently than Excite search-engine users, who focus more on e-commerce topics [ 11]. These results suggest regional differences on the Web.
SEARCH ENGINES
Several major search engines provide search services to non-English-speaking users. Having more than 160 local domains, Google allows users to restrict search results to pages in 117 languages, providing translation services between English and eight European languages (Dutch, French, German, Greek, Italian, Portuguese, Russian, and Spanish), three Oriental languages (Chinese, simplified and traditional, Korean, and Japanese), and Arabic. AltaVis-
References:
Archives