prediction was made, but rather explain
why this prediction was made instead of
another, so as to answer questions like
“Why Q rather than R?” Here Q is the
fact that requires explanation, and R is
the comparing case, which could be a
real one or virtual one. Consider, for instance, a user is declined mortgage. The
user may compare with another real
case and raise question: “Why didn’t I
get a mortgage when my neighborhood
did?” On the other hand, the user may
ask: “Why was my mortgage rejected?”
Here is an implicit contrast case, and
actually the user is requesting explanation for a virtual case “How to get
my mortgage loan approved?” Since it
is compared to an event that has not
happened, thus the desirable explanation here can also be called counterfactual explanation.
To provide contrastive explanations
for a model prediction, similar strategy could be used for both comparisons
mentioned earlier. We first produce
feature importance attribution for two
instances: not-accepted case for the
user, has-accepted case of a neighbor
(or would-be-accepted case of the user),
and then compare the two attribution vectors. Note that we could resort
to adversarial perturbation to find the
would-be-accepted case. Besides, it is
recommended to provide a diverse set of
reasons, that is, to find multiple contrast
cases, to make the explanation more informative. Ultimately, we generate explanations of the form: “Your mortgage
is rejected because your income is lower
than your neighbor’s, your credit history
is not as strong as your neighbor’s … or
“Your mortgage would be accepted if
your income is raised from x to y.”
Selective explanations. Usually, users
do not expect an explanation can cover
the complete cause of a decision. Instead, they wish the explanation could
convey the most important information that contributes to the decision.
A sparse explanation, which includes a
minimal set of features that help justify
the prediction is preferred, although
incompletely. Still use the mortgage
case for example. One good explanation could be presenting users the top
two reasons contributing to the decision, such as poor credit history or low
income to debt ratio.
Credible explanations. Good expla-
nation might be consistent with prior
equally important to assess the faithful-
ness of explanation to the original mod-
el, which is often omitted by existing
literature. As mentioned earlier, gener-
ated explanations for a machine learn-
ing model are not always reasonable to
humans. It is extremely difficult to tell
whether the unexpected explanation is
caused by misbehavior of the model or
limitation of the explanation method.
Therefore, better metrics to measure the
faithfulness of explanations are needed,
in order to complement existing evalua-
tion metrics. The degree of faithfulness
can determine how confident we can
trust an explanation. Nevertheless, the
design of appropriate faithfulness met-
ric remains an open problem and de-
serves further investigation.
We briefly introduce limitations of explanation methods we have surveyed
and then present explanation formats
that might be more understandable
and friendly to users.
Limitations of current explanations.
A major limitation of existing work on
interpretable machine learning is that
the explanations are designed based on
the intuition of researchers rather than
focusing on the demands of endusers.
Current local explanations are usually
given in the format of feature importance vectors, which are a complete
causal attribution and a low-level explanation.
23 This format would be satisfactory if the explanation audiences are developers and researchers, since they can
utilize the statistical analysis of the feature importance distribution to debug
the models. Nevertheless, this format
is less friendly if the explanation receivers are lay users of machine learning.
It describes the full decision logic of a
model, which contains a huge amount
of redundant information and will be
overwhelming to users. The presentation formats could be further enhanced
to better promote user satisfaction.
Toward human-friendly explanations. Based on findings in social sciences and human behavioral studies,
we provide some directions toward
user-oriented explanations, which
might be more satisfying to humans as
a means of communication.
Contrastive explanations. They are
also referred as differential explanations.
22 They do not tell why a specific
artifacts are often
two sides of
the same coin.