The general idea builds upon common elements shared with linked
data and Schema.org: a graph data
model of typed entities with named
properties. The Knowledge Graph approach, at least in its Google manifestation, is distinguished in particular
by a strong emphasis on up-front entity reconciliation, requiring curation
discipline to ensure new data is carefully integrated and linked to existing records. Schema.org’s approach
can be seen as less noisy and decentralized than linked data, but more
so than Knowledge Graphs. Because
of the shared underlying approach,
structured data expressed as Schema.
org is a natural source of information for integration into Knowledge
Graphs. Google documents some
ways of doing so. 7
Here are some of the most important
lessons we have learned thus far, some
of which might be applicable to other
standards efforts on the Web. Most are
completely obvious but, interestingly,
have been ignored on many occasions.
1. Make it easy for publishers/develop-ers to participate. More generally, when
there is an asymmetry in the number
of publishers and the number of consumers, put the complexity with the
smaller number. They have to be able
to continue using their existing tools
2. No one reads long specifications.
Most developers tend to copy and edit
examples. So, the documentation is
more like a set of recipes and less like
3. Complexity has to be added incrementally, over time. Today, the average Web page is rather complex, with
being very simple, however, and the
complexity was added mostly on an
as-needed basis. Each layer of complexity in a platform/standard can
be added only after adoption of more
The idea of the Web infrastructure re-
quiring structured data mechanisms
to describe entities and relationships
in the real world has been around for
as long as the Web itself. 1, 2, 13 The idea
of describing the world using networks
of typed relationships was well known
even in the 1970s, and the use of logical
statements about the world has a his-
tory predating computing. What is sur-
prising is just how difficult it was for
such seemingly obvious ideas to find
their way into the Web as an informa-
tion platform. The history of Schema.
org suggests that rather than seeking
directly to create “languages for intelli-
gent agents,” addressing vastly simpler
scenarios from Web search has turned
out to be the best practical route to-
ward structured data for artificial per-
Over the past four years, Schema.
org has evolved in many ways, both
organizationally and in terms of the
actual schemas. It started with a couple of individuals who created an informal consortium of the three initial
sponsor companies. In the first year,
these sponsor companies made most
decisions behind closed doors. It incrementally opened up, first moving
most discussions to W3C public forums, and then to a model where all
discussions and decision making are
done in the open, with a steering committee that includes members from
the sponsor companies, academia,
and the W3C.
Four years after its launch, Schema.org is entering its next phase,
with more of the vocabulary development taking place in a more distributed fashion. A number of extensions,
for topics ranging from automobiles
to product details, are already under
way. In such a model, Schema.org itself is just the core, providing a unifying vocabulary and congregation
forum as necessary.
The increased interest in big data
makes the need for common schemas
even more relevant. As data scientists
are exploring the value of data-driven
analysis, the need to pull together data
from different sources and hence the
need for shared vocabularies is increasing. We are hopeful that Schema.
org will contribute to this.
would not be what it is today without
the collaborative efforts of the teams
from Google, Microsoft, Yahoo and
Yandex. It would also be unrecognizable without the contributions made
by members of the wider community
who have come together via W3C.
Proving the Correctness
of Nonblocking Data Structures
Managing Semi-Structured Data
The Five-minute Rule: 20 Years Later and
How Flash Memory Changes the Rules
1. Berners-Lee, T. Information management: a proposal;
2. Berners-Lee, T. W3 future directions, 1994; http://
3. Berners-Lee, T. Linked Data, 2006; http://www.w3.org/
4. Berners-Lee, T. Is your linked open data 5 star? 2010;
5. Berners-Lee, T., Hendler, J. and Lassila, O. The semantic
web. Scientific American (May 2001), 29–37; http://www.
6. Friend of a Friend vocabulary (foaf); http://lov.okfn.org/
7. Google Developers. Customizing your Knowledge
Graph, 2015; https://developers.google.com/
8. Guha, R. V. Good Relations and Schema.org. Schema
9. Raimond, Y. Schema.org for TV and radio markup.
Schema Blog; http://blog.schema.org/2013/12/
10. Schema.org. Release log; http://schema.org/docs/
11. Schofield, J. Let’s be Friendsters. The Guardian
(Feb. 19, 2004); http://www.theguardian.com/
12. Wallis, R., Scott, D. Schema.org support for
bibliographic relationships and periodicals. Schema
13. W3C. Describing and linking Web resources.
Unpublished note, 1996; http://www.w3.org/
14. W3C. Library Linked Data Incubator Group Final
Report, 2011; http://w3.org/2005/Incubator/lld/XGR-
15. W3C. Linked Data Cookbook; http://www.w3.org/2011/
16. W3C. Health Care and Life Science Linked Data
Guide, 2012; http://www.w3.org/2001/sw/hcls/notes/
17. W3C. RDF Schema 1. 1, 2014; http://www.w3.org/ TR/
18. W3C. MCF Using XML, R.V.Gu ha, T.Br ay, 1997; http://
R.V. Guha is a Google Fellow and a vice president in
research at Google. He is the creator of Web standards such
as RSS and Schema.org. He is also responsible for products
such as Google Custom Search. He was a co-founder of
Epinions.com and Alpiri and co-leader of the Cyc project.
Dan Brickley works at Google on the Schema.org initiative
and structured-data standards. He is best known for his
work on Web standards in the W3C community, where he
helped create the Semantic Web project and many of its
defining technologies. Previous work included metadata
projects around TV, agriculture, DLs, and education.
Steve Macbeth is partner architect in the application and
service group at Microsoft, where he is responsible for
designing and building solutions at the intersection of mobile,
cloud, and intelligent systems. Previously, he was a senior
leader in the Bing Core Search, general manager and co-founder of the Search Technology Center Asia, and founder
and CTO of Riptide Technologies and pcsupport.com.
Copyright held by authors.