including one on a public cloud.
As a graph, the Merkle DAG underpinning the archive consists of 10 billion
nodes and 100 billion edges; in terms of
resources, the compressed and fully de-duplicated archive requires some 200TB
of storage space. These figures grow
constantly, as the archive is kept up to
date by periodically crawling major code
hosting sites and software distributions,
adding new software artifacts, but never
removing anything. The contents of the
archive can already be browsed online,
or navigated via a REST API.f
We are at a unique turning point in
the history of computer science and
technology. Looking backward, we see
many important pieces of historical
software that are lost, misplaced, or behind barriers. On the other hand, many
of our founding fathers are still here.
They have the knowledge and the will
to share what is necessary to rebuild the
full history of our discipline—a unique
opportunity that no other field of science or technology has ever offered.
Looking to the future, we see software development skyrocketing. It is
urgent to build the missing infrastructure and put in place the good practices
necessary to ensure our entire software
commons will be properly collected
and preserved. Every year that goes by
without acting significantly increases
By launching Software Heritage,
Inria has done the initial effort, creating the archive infrastructure, establishing an agreement with UNESCO,
and assembling an initial group of
supportersg and committed sponsors,
including Microsoft, Intel, Société
Générale, Huawei, Google, GitHub,
Qwant, Nokia Bell Labs, DANS, FossID,
UQAM, and the University of Bologna.
Now we need to move forward, and
grow Software Heritage into an international common infrastructure.
Four ingredients are key to the suc-
cess of our mission: raising awareness
of the importance of source code as a
first-class citizen in our cultural heri-
tage; gathering the resources needed
to create the infrastructure; leveraging
f See https://archive.softwareheritage.org/
g See https://www.softwareheritage.org/support/
the expertise from many fields of our
discipline; and building on a commu-
nity that shares the vision.
As an open initiative, Software Heri-
tage strives to act as a host and a cata-
lyzer for this community, and we are
now calling for contributors to join
forces and tackle the issues highlight-
ed in this Viewpoint, and the many oth-
ers that will arise along the way. A few
of these issues include:
˲ For the collection phase, we need
help recovering important software
from the past and building adaptors for
the many hosting platforms and source
code distribution formats.
˲For the preservation phase, we
need resources to host mirrors, as well
as contributors willing to try different
technologies for storing and mirroring
˲For the sharing phase, help is
needed to organize the contents, to
build efficient indexing and querying
mechanisms, and to develop applica-
tions for specific domains.
scientists, and IT professionals—have
a noble mission and a grand challenge:
let’s work together to deliver on it.
1. Abelson, H., Sussman, J., and Sussman, J. The
Structure and Interpretation of Computer Programs.
Preface by A.J. Perlis, MIT Press, 1985.
2. Di Cosmo, R. and Zacchiroli, S. Software Heritage: Why
and How to Preserve Software Source Code. iPRES 2017.
3. Free Software Foundation, Inc. The GNU General
Public License, Version 3, § 1, 2007.
4. Shustek, L.J. What should we collect to preserve the
history of software. IEEE Annals of the History of
5. Spinellis, D. A repository of Unix history and evolution.
Empirical Software Engineering, 2017.
6. Squire, M. The Lives and Deaths of Open Source Code
Forges. OpenSym, 2017.
Jean-François Abramatic (Jean-Francois.Abramatic@
inria.fr) is research director emeritus at Inria, the
French Institute for Research in Computer Science and
Roberto Di Cosmo ( firstname.lastname@example.org) is director of
Software Heritage at Inria, and professor of computer
science at IRIF, University Paris Diderot.
Stefano Zacchiroli ( email@example.com) is associate
professor of computer science at IRIF, University Paris
Diderot, and CTO of Software Heritage at Inria.
Copyright held by authors.
UIST ‘18: The 31th Annual ACM
Symposium on User Interface
Software and Technology,
Contact: Patrick Baudisch,
CCS ‘18: 2018 ACM SIGSAC
Conference on Computer and
Toronto, ON, Canada
Contact: David J.F. Lie,
ICMI ‘18: International
Boulder, CO, USA
Contact: Sidney D’Mello,
CIKM 2018: The 27th ACM
International Conference on
Information and Knowledge
Contact: Alfredo Cuzzocrea,
MM ‘18: ACM Multimedia
Seoul, Republic of Korea,
Contact: Kyoung Mu Lee,
CHI PLAY ‘18: The Annual
Symposium on Computer-Human Interaction in Play,
Melbourne, VIC, Australia
Contact: Florian Mueller,
October 28–November 2
MSWIM ‘18: 21th ACM Int’l
Conference on Modeling,
Analysis and Simulation of
Wireless and Mobile Systems,
Montreal, QC, Canada
Contact: Azzedine Boukerche,
Watch the authors discuss
their work in this exclusive