es this complexity, with potentially ad
hoc and unforeseen interactions between devices and services on top of
the complex cloud and edge computing
infrastructure most Io T services rely on.
One answer to this problem is to
build applications in “silos” where the
involved parties are known in advance,
but as a side-effect locking-in devices
and services to a single company (for
example, the competing smart-home
offerings by leading technology companies). This is far from the Io T vision
of a connected environment, but most
existing products fall into this category. There are obviously major business
considerations behind this model, and
it should be noted that the EU GDPR
mandates for some form of interoperability (although it is yet unclear how it
should be interpreted12).
An alternative to such “lock-in”
would be to make devices’ consumption of data transparent and accountable. If data is exchanged across devices, the concerned user should be
able to audit its usage. However, in an
environment where arbitrary devices
could interact (although it must be
remembered that EU GDPR requires
explicit and informed user consent),
how can trust be established in the audit record? This requires an in-depth
rethinking of how IoT platforms are
designed, potentially exploring the
security-by-design approach based on
hardware roots of trust13 to provide
trusted digital enclaves in which behavior can be audited. Some form of
“accountability-by-design” principle
should also be encouraged, where
transparency and the implementation
of a trustworthy audit mechanism is a
core concern in product design.
Such solutions have been explored in
the provenance space, for example, by
leveraging SGX properties to provide a
strong guarantee of the integrity of the
provenance record.
4 Similarly, remote
attestation techniques leveraging TPM
hardware have been proposed6 to guarantee the integrity of the capture mechanism. However, how to provide such
guarantees in an Io T environment, where
such hardware features may not be available, is a relatively unexplored topic.
Where Does the Audit Live?
The fully realized Io T vision is of vast
distributed and decentralized systems.
Modern computing systems contain many components that operate
as black boxes; they accept inputs and
generate outputs but do not disclose
their internal working. Beyond privacy
concerns, this also limits the ability to
detect cyber-attacks, or more generally
to understand cyber-behavior. Because
of these concerns DARPA, in the U.S.,
launched the Transparent Computing
projectg to explore means to build more
transparent systems through the use of
digital provenance with the particular
aim of identifying advanced persistent
threats. While DARPA’s work is a good
start, we believe there is an urgent need
to reach much further. In the remainder of this Viewpoint, we explore how
provenance can be an answer to some
Io T concerns and the challenges faced
to deploy provenance techniques.
Digital Provenance
There is a growing clamor for more
transparency, but straightforward,
widespread technical solutions have
yet to emerge. Typical software log records often prove insufficient to audit
complex distributed systems as they
fail to capture the complex causality
relationships between events. Digital
provenance8 is an alternative means
to record system events. Digital provenance is the record of information flow
within a computer system in order to
assess the origin of data (for example,
its quality or its validity).
The concept first emerged in the database research community as a means
to explain the response to a given
query.
16 Provenance research later expanded to address issues of scientific
reproducibility, notably by providing
mechanisms to reconstitute computational environments from formal
records of scientific computations.
23
More recently, provenance has been explored within the cybersecurity community25 as a means to explain intrusions18
or more recently to detect them.
14
Provenance records are represented
as a directed acyclic graph that shows
causality relationships between the
states of the objects that compose a
complex system. As a consequence, it
is compatible with automated mathe-
matical reasoning. In such a graph, the
vertices represent the state of transient
g See https://bit.ly/2Uf5bQ Y
An outcome of research on prove-
nance in the cybersecurity space is the
understanding that the capture mecha-
nism must provide guarantees of com-
pleteness (all events in the system can
be seen), accuracy (the record is faith-
ful to events) and a well-defined, trust-
ed computing base (the threat model is
clearly expressed).
22 Otherwise, attacks
on the system may be undetected, dis-
simulated by the attacker, or misattrib-
uted. We argue that in a highly ad hoc
and interoperable environment with
mutually untrusted parties, the prove-
nance used to empower end users with
control and understanding over data
usage requires similar properties.
Who to Trust?
In the Io T environment the number of
involved stakeholders has the potential
to explode exponentially. Traditionally,
a company managed its own server infrastructure, maybe with the help of a
subcontractor. The cloud computing
paradigm further increased complexity with the involvement of cloud service providers (sometimes stacked, for
example, Heroku PaaS on top of the
Amazon IaaS cloud service), third-party
service providers (for example, Cloud-MQTT) and other tenants sharing the
infrastructure. The Io T further increas-
Building transparent
and auditable
systems may be
one of the greatest
software engineering
challenges of the
coming decade.