lack testable implications. Therefore,
the veracity of the resultant estimate
must lean entirely on the assumptions
encoded in the arrows of Figure 3, so
neither refutation nor corroboration
can be obtained from the data.c
The same procedure applies to
more sophisticated queries, as in, say,
the counterfactual query Q = P(yx |x′,y′)
discussed earlier. We may also permit
some of the data to arrive from con-
trolled experiments that would take
the form P(V|do(W)) in case W is the
controlled variable. The role of the Es-
timand would remain that of convert-
ing the Query into the syntactic form
involving the available data and then
guiding the choice of the estimation
technique to ensure unbiased esti-
mates. The conversion task is not al-
ways feasible, in which case the Query
is declared “non-identifiable,” and the
engine should exit with FAILURE. For-
tunately, efficient and complete algo-
rithms have been developed to decide
identifiability and produce Estimands
for a variety of counterfactual queries
and a variety of data types.
3, 30, 32
I next provide a bird’s-eye view of
seven tasks accomplished through the
SCM framework and the tools used in
each task and discuss the unique contribution each tool brings to the art of
automated reasoning.
Tool 1. Encoding causal assump-
tions: Transparency and testability.
The task of encoding assumptions in
a compact and usable form is not a
trivial matter once an analyst takes
seriously the requirement of transpar-
ency and testability.d Transparency en-
ables analysts to discern whether the
assumptions encoded are plausible
(on scientific grounds) or whether ad-
ditional assumptions are warranted.
Testability permits us (whether analyst
or machine) to determine whether the
assumptions encoded are compatible
c The assumptions encoded in Figure 3 are con-
veyed by its missing arrows. For example, Y
does not influence X or Z, X does not influence
Z, and, most important, Z is the only variable
affecting both X and Y. That these assump-
tions lack testable implications can be con-
cluded directly from the fact that the graph is
complete; that is, there exists an edge connect-
ing every pair of nodes.
d Economists, for example, having chosen al-
gebraic over graphical representations, are
deprived of elementary testability-detect-
ing features.
21
with the available data and, if not, iden-
tify those that need repair.
Advances in graphical models have
made compact encoding feasible. Their
transparency stems naturally from the
fact that all assumptions are encoded
qualitatively in graphical form, mirroring the way researchers perceive cause-effect relationships in the domain;
judgments of counterfactual or statistical dependencies are not required,
since such dependencies can be read
off the structure of the graph.
18
Testability is facilitated through a graphical
criterion called d-separation that provides the fundamental connection between causes and probabilities. It tells
us, for any given pattern of paths in the
model, what pattern of dependencies
we should expect to find in the data.
15
Tool 2. Do-calculus and the control
of confounding. Confounding, or the
presence of unobserved causes of two
or more variables, long considered
the major obstacle to drawing causal
inference from data, has been demys-tified and “deconfounded” through a
graphical criterion called “backdoor.”
In particular, the task of selecting an
appropriate set of covariates to control
for confounding has been reduced to a
simple “roadblocks” puzzle manageable through a simple algorithm.
16
For models where the backdoor
criterion does not hold, a symbolic
engine is available, called “do
-calculus,” that predicts the effect of policy
interventions whenever feasible and
exits with failure whenever predictions
cannot be ascertained on the basis of
the specified assumptions.
3, 17, 30, 32
Tool 3. The algorithmitization of
counterfactuals. Counterfactual analysis deals with behavior of specific individuals identified by a distinct set
of characteristics. For example, given
that Joe’s salary is Y = y, and that he
went X = x years to college, what would
Joe’s salary be had he had one more
year of education?
One of the crowning achievements
of contemporary work on causality
has been to formalize counterfactual
reasoning within the graphical rep-
resentation, the very representation
researchers use to encode scientific
knowledge. Every structural equation
model determines the “truth value” of
every counterfactual sentence. There-
fore, an algorithm can determine if the
The Estimand (ES) is a mathematical
formula that, based on the Assump-
tions, provides a recipe for answering
the Query from any hypothetical data,
whenever it is available. After receiving
the data, the engine uses the Estimand
to produce an actual Estimate (ÊS) for
the answer, along with statistical es-
timates of the confidence in that an-
swer, reflecting the limited size of the
dataset, as well as possible measure-
ment errors or missing data. Finally,
the engine produces a list of “fit indi-
ces” that measure how compatible the
data is with the Assumptions conveyed
by the model.
To exemplify these operations, assume our Query stands for the causal
effect of X (taking a drug) on Y (
recovery), written as Q = P(Y|do(X)).
Let the modeling assumptions be
encoded (see Figure 3), where Z is a
third variable (say, Gender) affecting
both X and Y. Finally, let the data be
sampled at random from a joint distribution P(X, Y, Z). The Estimand
(ES) derived by the engine (
automatically using Tool 2, as discussed in
the next section) will be the formula
ES = ∑z P( Y|X, Z)P(Z), which defines a
procedure of estimation. It calls for
estimating the gender-specific conditional distributions P(Y|X, Z) for
males and females, weighing them by
the probability P(Z) of membership
in each gender, then taking the average. Note the Estimand ES defines a
property of P(X, Y, Z) that, if properly
estimated, would provide a correct
answer to our Query. The answer itself, the Estimate ÊS, can be produced
through any number of techniques
that produce a consistent estimate
of ES from finite samples of P(X, Y, Z).
For example, the sample average (of
Y) over all cases satisfying the specified X and Z conditions would be a
consistent estimate. But more-effi-cient estimation techniques can be
devised to overcome data sparsity.
28
This task of estimating statistical relationships from sparse data is where
deep learning techniques excel, and
where they are often employed.
33
Finally, the Fit Index for our example
in Figure 3 will be NULL; that is, after
examining the structure of the graph in
Figure 3, the engine should conclude
(using Tool 1, as discussed in the next
section) that the assumptions encoded