Why Are There No Alternatives?
Why are there no real alternatives to
the few popular search engine index
providers? Firstly, index providers
face huge technical difficulties due
to the large numbers of documents
resulting from the ever-changing
nature of the Web. A second, significant, issue is the cost of hardware,
infrastructure, maintenance, and
staff. Thirdly, the Web is huge, and
a search engine index needs to be
tasked with covering as large a part of
it as possible. While we know that no
search engine can cover the Web in
total, modern search engines know
of trillions of existing pages.
9 And indexing these pages is only the start.
A search engine must keep its index
current, meaning it needs to update
at least a part of it every minute. This
is an important requirement that is
not being met by any of the current
projects (such as Common Crawl)
aiming at indexing snapshots of
(parts of) the Web.
Separating Index and Services
I am proposing an idea for a missing
part of the Web’s infrastructure, namely a searchable index. The idea is to
separate the infrastructure part of the
search engine (the index) from the services part, thereby allowing for a multitude of services, whether existing as
search engines or otherwise, to be run
on a shared infrastructure.
The accompanying figure shows
how the public infrastructure is re-
sponsible for crawling the Web, for in-
dexing its content, and for providing
that every search engine presents its
own algorithmically generated view of
the Web’s content. Every such view can
be different, and none of them are the
definitive or correct one.
Problems that may arise from
search engines’ interpreting the world
in certain ways include: reinforcing stereotypes, for example, toward
women;
7 influencing public opinion
in the context of political elections
(see, for example, Epstein and Robertson2); and preferring dramatic interpretations of rather harmless health-related symptoms.
13
It seems, therefore, unreasonable
to have only one (or a few) dominant
search engines imposing their view
on the Web’s content, which is, on
closer inspection, really only one of
many possible views. Therefore, I argue for building an index of the Web
that will form the basis for a multitude of search engines and other services that are based on Web data.
Three Major Problems
There are three major problems resulting from a search engine market where
only a few competitors are equipped
with their own index of Web pages:
˲ A search engine provides only one
of many possible algorithmic interpretations of the Web’s content. At least
for informational queries (see Broder1), there is no correct set of results,
let alone one single correct result. For
these queries, we usually find a multitude of results of comparable quality.
While a search engine’s ranking might
provide some relevant results on the
highest positions, there may be many
more (or to some users, even better)
results on lower positions.
˲ Every search engine faces a conflict
of interest when it also acts as a content
provider and shows results from its
own offerings on its results pages (for
example, Google showing results from
its subsidiary You Tube). This problem
gets exacerbated when one search engine has a large market share, as it is
able to increase both its influence on
its users as well as its suppression of its
competitors’ offerings.
˲ The more users rely on a single
search engine, the higher the influence of search engine optimization
(SEO) on the search results, and therefore, on what users get to see from the
Web. The aim of SEO is to optimize
Web pages so they get ranked higher
in search engines (that is, influencing
a search engine’s results). Taken together with the fact that SEO is now a
multibillion-dollar industry,
12 we can
see huge external influences on search
engine results.
A Lack of Plurality
Considering these three problems,
we can see that in the current market
situation, we are far from plurality,
not only in terms of the numbers of
search engine providers but also in
the number of search results. A Yahoo
2011 study showed that while we can
regard a search engine as a possible
window to all of the Web’s content,
more that 80% of all user clicks were
found to go to only 10,000 different
domains.
4 We can assume these numbers are comparable for other search
engines. Taken together, search engines have a huge influence on what
we as users get to see on the results
pages, and consequently, what we select from.
It has been found
that search results
simply should not
be considered
“neutral.”
Separating services from infrastructure.
OWI
Crawlerwww
OWI
Basic Indexer
OWI
Advanced Indexer
OWI
Web Index
OWI
Usage Data Index
User User User
OWI Interface / API
User User User User User User User User User User
Service 4