structured sources of data to achieve the
goals of a clean, structured knowledge
graph. A useful tool for Facebook has
been to think of the graph as the model
and a Facebook page as the view—a projection of an entity or collection of entities that reside in the graph.
eBay is building its Product Knowledge Graph, which will encode semantic
knowledge about products, entities, and
their relationships with each other and
the external world. This knowledge will
be key to understanding what a seller is
offering and a buyer is looking for and
intelligently connecting the two, a key
part of eBay’s marketplace technology.
For example, eBay’s knowledge
graph can relate products to real-world
entities, defining the identity of a product and why it might be valuable to a
buyer. A basketball jersey for the Chicago Bulls is one product, but if it is
signed by Michael Jordan, it is a very different product. A postcard from 1940 in
Paris might be just a postcard; knowing
that Paris is in France and that 1940 is
during World War II changes the product entirely.
Entities in the knowledge graph
can also relate products to each other.
If a user searches for memorabilia of
Lionel Messi and the graph indicates
that Lionel Messi plays for Futbol Club
Barcelona, then, maybe, merchandise
for that club is of interest, too. Perhaps
memorabilia for other famous Barcelona players will be of interest to this
shopper. Related merchandise should
include soccer-based products such as
signed shirts, strips, boots, and balls.
This idea can extend from sports to music, film, literature, historical events,
and much more.
Just as important as entity relations
is understanding the products them-
selves and their relationships. Know-
ing that one product is an iPhone and
another is a case for an iPhone is obvi-
ously important. But the case might fit
some phones and not others, so eBay
needs to model the parts and accessory
sizes. Knowing the many variants and
relationships of products is also impor-
tant: Which products are manufacturer
variants of one product? Do they come
in different sizes, capacities, or colors?
Which are comparable—meaning they
have mostly the same specifications
but perhaps different brands or colors?
The system also needs to understand
provenance and an inferred confidence
level about the assertion.
• Correctness does not mean the
knowledge graph always knows the
“right” value for an attribute, but rather
that it is always able to explain why a
certain assertion was made. Therefore,
it keeps provenance for all data that
flows through the system, from data acquisition to the serving layer.
•Structure means the knowledge
graph must be self-describing. If a piece
of data is not strongly typed or does not
fit the schema describing the entity,
then the graph attempts to do one of
the following: convert the data into the
expected type (for example, performing
simple type coercion, handling incorrectly formatted dates); extract structured data that matches the type (for
example, run natural language processing, NLP) on unstructured text such as
user reviews to convert into typed slots);
or leave it out entirely.
•Lastly, the Facebook knowledge
graph is designed for constant change.
The graph is not a single representation
in a database that is updated when new
information is received. Instead, the
graph is built from scratch, from the
sources, every day, and the build system
is idempotent—producing a complete
graph at the end of it.
An obvious place for a Facebook
knowledge graph to start is the Facebook pages ecosystem. Businesses and
people create pages on Facebook to
represent a huge range of ideas and interests. Furthermore, having the owner
of an entity make assertions about it is
a valuable source of data. As with any
crowd-sourced data, however, it is not
without its challenges.
Facebook pages are very public facing, and millions of people interact
with them every day. Thus, the interests of a page owner don’t always align
with the requirements of a knowledge
Most commonly, pages and entities
do not have a strict 1: 1 mapping, as
pages can represent collections of entities (for example, movie franchises).
Data can also be incomplete or very unstructured (blobs of text), which makes
it more difficult to use in the context of
a knowledge graph.
Facebook’s biggest challenge has
been to leverage data found on its pages and to combine it with other more
use case creates
that is not
directly present in
or data sources.