and topic detection and tracking. Examples of applications are opinion-mining tools and market research
applications.
In the field of artificial intelligence,
the Open Web Index could be used as a
basis for large-scale machine learning.
Likely applications in this area are machine translation, question-answering,
and conversational applications.
Last but not least, an Open Web Index would provide a rich data source
for researchers in many different
fields, ranging from computer science and computational linguistics
to computational social sciences and
research evaluation.
It is clear this short list of ideas is far
from being complete and only serves illustrative purposes. It shows, however,
the huge potential of making Web data
open to all parties interested.
Alternative Approaches
Some alternative solutions have been
proposed for fostering plurality in the
search engine market. The first and
probably most obvious solution is to
wait for commercial market players
to develop alternatives. However, as
we have seen in the last 15 years or so,
Bing has been the only search engine
capable of gaining considerable market share. Other search engines have
failed, have been acquired by larger
search companies, or have focused on
niche markets. All new search engine
providers face the problem of having
to build their own index, which is, as
has been described earlier, a very costly undertaking. Furthermore, what
would be gained if we had one or two,
even three more search engines on the
market? From my point of view, the
problem lies not in having a few more
search engines, but in providing real
search plurality.
The second line of argumentation
says Google should be forced to provide
fair and unbiased results. This is what
the European Commission’s competitive investigation against Google has
been all about. However, as ranking
results are always based on interpretations (and human assumptions inherent in the ranking algorithms), there
is no such thing as an unbiased result
set. Only a multitude of different algorithmic interpretations can help bring
about search plurality.
The third line of argumentation
calls for Google to open its index to
third parties. Then, it would be possible to build (search) applications on
top of Google’s index. However, the
control over the index—and over what
third parties would be able to get from
the index—would still lie in the hands
of a private company, the index would
still not be transparent, and there
would still be no influence on how the
index is composed.
The fourth, and already widely discussed solution, is building a publicly
funded search engine as an alternative
to the commercial enterprises. However, this again would only add one more
search engine to the market, instead of
fostering plurality.
Conclusion
The main idea I presented in this Viewpoint is to foster building search engines and other services needing Web
data on top of a public infrastructure
that is open to everyone. A multitude
of such services would foster plurality
not only on the search engine market
(with the result of having more than a
few search engines to choose from) but
even more importantly, a plurality with
regard to the results users get to see
when using search engines.
Search results as a basis for
knowledge acquisition in society
seem too important to be left solely
in the hands of a few commercial
enterprises. The Open Web Index
is comparable to other public services such as constructing roads and
railroad tracks, supporting public
broadcasting and, most notably,
building a library system. An Open
Web Index could be one of the main
building blocks of the library of the
21st century.
An Open Web Index is a project
that cannot and should not be un-
Those that benefit
from the index
should have their say
in building it.
dertaken by a single company or institution. On the contrary, I envision
building such an index as a task of
society and for society, meaning we
should build the index involving all
actors and interest groups relevant
to society at large. Those that benefit
from the index should have their say
in building it.
A question that remains is funding.
As a considerable amount of money
is needed, I argue for public funding
not by a single state, but rather by a
larger entity such as the European
Union. This, however, does not mean
a governmental body should also be
the operator of the Open Web Index.
Rather, it should be run by an organization that is relatively free from
state intervention. One could think
of a foundation running it or a model
similar to public broadcasting. Whatever the mode of operation, as a project of and for society, funding should
be applied for the greater good.
References
1. Broder, A. A taxonomy of Web search. ACM SIGIR
Forum 36, 2 (2002), 3–10.
2. Epstein, R. and Robertson, R.E. The search engine
manipulation effect (SEME) and its possible impact
on the outcomes of elections. In Proceedings of the
National Academy of Sciences 112, 33 (2015), E4512–
E4521.
3. European Commission. Antitrust: Commission fines
Google € 2. 42 billion for abusing dominance as search
engine by giving illegal advantage to own comparison
shopping service—Factsheet, 2017; https://bit.ly/
2tRknDJ.
4. Goel, S. et al. Anatomy of the long tail: Ordinary
people with extraordinary tastes. In Proceedings
of the Third ACM International Conference on Web
Search and Data Mining, ACM (2010), 201–210.
5. Grimmelmann, J. Some skepticism about search
neutrality. The Next Digital Decade: Essays on the
Future of the Internet 31, (2010), 435–460.
6. Lewandowski, D. Is Google responsible for providing
fair and unbiased results? In M. Taddeo and L. Floridi,
Hrsg., The Responsibilities of Online Service Providers.
Springer, Berlin Heidelberg, 2017, 61–77.
7. Noble, S.U. Algorithms of Oppression: How Search
Engines Reinforce Racism. New York University Press,
N Y, USA, 2018.
8. Purcell, K., Brenner, J., and Raine, L. Search Engine
Use 2012. Washington, D. C., USA, 2012.
9. Schwartz, B. Google’s search knows about over 130
trillion pages. Search Engine Land, 2016; https://selnd.
com/2g7MnA7.
10. Sterling, G. Data: Google monthly search volume
dwarfs rivals because of mobile advantage. Search
Engine Land, 2017.
11. Sullivan, D. Google now handles at least 2 trillion
searches per year. Search Engine Land, 2016; https://
selnd.com/2GsdYYq
12. Sullivan, L. Report: Companies will spend $65
billion on SEO in 2016. Media Post, 2016; https://bit.
ly/2BqNrqX
13. White, R. W. and Horvitz, E. Cyberchondria. ACM
Transactions on Information Systems 27, 4 (2009),
Article No. 23.
Dirk Lewandowski ( dirk.lewandowski@haw-hamburg.de)
is Professor for Information Research and Information
Retrieval at the Hamburg University of Applied Sciences in
Hamburg, Germany.
Copyright held by author.