Vviewpoints
I
M
A
G
E
B
Y
A
N
D
R
E
Y
V
P
attention to software safety, security,
reliability, and traceability. But unlike other scientific fields, we lack
large-scale research instruments for
enabling massive analysis of all the
available software source code.
As computer scientists and professionals, it is our duty, responsibility,
and privilege to build a shared infrastructure that answers these needs.
Not just for our community, not just
for the technical and scientific community, but for society as a whole.
Software Heritagea is an initiative
launched at Inria—the French Institute
for Research in Computer Science and
Automation—precisely to take up this
a See https://www.softwareheritage.org
SOFTWARE IS BECOMING the fabric that binds our personal and social lives, embodying a vast part of the technologi- cal knowledge that powers
our industry and fuels innovation. Software is a pillar of most scientific research
activities in all fields, from mathematics
to physics, from chemistry to biology,
from finance to social sciences. Software is also an essential mediator for accessing any digital information.
In short, a rapidly increasing part of
our collective knowledge is embodied
in, or dependent on, software artifacts.
Our ability to design, use, understand,
adapt, and evolve systems and devices
on which our lives have come to depend
relies on our ability to understand,
adapt, and evolve the source code of
the software that controls them.
Software source code is a precious,
unique form of knowledge. It can be
readily translated into a form executable by a machine, and yet it is human
readable: Harold Abelson wrote “
Programs must be written for humans to
read,” 1 and source code is the preferred
form for modification of software artifacts by developers. 3 Quite differently
from other forms of knowledge, we
have grown accustomed to use version-control systems that trace source code
development, and provide precious insight into its evolution. As Len Shustek
puts it, “Source code provides a view
into the mind of the designer.” 4
And yet, we have not been taking
good care of this precious form of
knowledge.
Source code is spread around a variety
of platforms and infrastructures that we
use to develop and/or distribute it, and
software projects often migrate from
one to another: there is no universal
catalog that tracks it all.
Software can be deleted, corrupted,
or misplaced. What’s even more worrying, in recent years we have seen major
code forges shut down, endangering
hundreds of thousands of publicly
available software projects at once. 6
We clearly need a universal archive
of software source code.
The deep penetration of software
in all aspects of our world brings
along failures and risks whose potential impact is growing. Users now
understand the need for an organized
Viewpoint
Building the Universal
Archive of Source Code
A global collaborative project for the benefit of all.
DOI: 10.1145/3183558