Markov logic, as Bayesian networks can
be translated into MLNs.
Inference
Given an MLN model, the questions of
interest are answered by performing
inference on it. (For example, “What are
the topics of these Web pages, given the
words on them and the links between
them?”) Because an MLN acts as a template for a Markov network, we can
always apply standard Markov network
inference methods on the instantiated
network. However, methods that also
exploit the logical structure in an MLN
can yield tremendous savings in memory and time. We first provide an overview of inference in Markov networks,
and then describe how these methods
can be adapted to take advantage of the
MLN structure.
Markov network inference. The
main inference problem in Markov
networks is computing the probability
of a set of query variables Q given some
evidence E:
( ) () ()
()
()
,
,
,
|
,,
,,
qe h
e qh
P qe
PQqEe
Pe
qeh Z
q eh Z
′
′′
= ==
Φ′
==
Φ′ ′
∑
∑
( 4)
where H = X − Q − E denotes the remaining nonquery, nonevidence variables, Φ
is the unnormalized product of potentials from Equation 1, and Zq, e and Ze are
the partition functions of reduced
Markov networks, where the query and
evidence variables have been fixed to
constants. Thus, if we can compute partition functions, then we can answer
arbitrary probabilistic queries.
In general, computing Z or answer-
ing other queries in a Markov network
is #P-complete. When the Markov net-
work has certain structural properties,
such as a tree or tree-like structure,
inference can be done in polynomial
time. For network structures with many
variables and loops, exact inference is
usually intractable and approximate
inference algorithms are used instead.
The two most common approaches to
approximation are to generate random
samples through some random pro-
cess that converges to the true distribu-
tion, or to solve a relaxed problem that
captures as many constraints from the
original as possible. Examples of the
former approach include Markov chain
Monte Carlo (MCMC) and importance
sampling, and examples of the latter
include loopy belief propagation and
variational methods.
Any of these methods could be used
to perform inference in an MLN after
instantiating the ground network, and
many of them have. However, inference
remains challenging in Markov networks and even more challenging in
MLNs, which are often very large and
have many loops. Next we will discuss
on one of the most promising inference methods to date, which can take
advantage of logical structure, perform
exact inference when tractable, and be
relaxed to perform approximate inference when necessary.
Weighted model counting.
Computing the partition function in a
Markov network can be reduced to a
weighted model counting (WMC)
problem. Weighted model counting
finds the total weight of all satisfying
assignments to a logical formula F.
Following Chavira and Darwiche, 2 we
focus on literal-weighted model counting, in which each literal is assigned a
real-valued weight and the weight of
an assignment is the product of the
weights of its literals.
To represent Z as a WMC problem,
we need each assignment x to receive
weight Φ(x). Suppose each potential φi
(x) evaluates to a constant Θi when a
logical expression Fi is satisfied and 1
otherwise. (If the Markov network is not
already in this form, we can convert it
efficiently.) To define the WMC problem, for each potential φi, we introduce
a literal Ai with weight Θi. We also introduce a logical formula, Ai ⇔ Fi, so that Ai
is only true when Fi is satisfied. Thus,
the product of the weights of the Ai literals is exactly the product of the original
potential functions.
WMC algorithms can then be used to
solve the problem and compute Z. One
approach is recursive decomposition, in
which we break the WMC problem into
two subproblems, one where some vari-
able xi is fixed to true and one where xi is
fixed to false. This requires exponential
time in the worst case, but WMC algo-
rithms can often exploit problem struc-
ture to solve it much faster in practice.
Another approach is to compile the
model into a logical form where WMC is
tractable, such as d-DNNF, and build an
arithmetic circuit based on it. 2 Once
compiled, the arithmetic circuit can be
reused for multiple queries.
Probabilistic theorem proving. WMC
is a natural approach to inference in
MLNs, as MLNs already use a logical
representation for their features. However, MLNs have additional structure to
exploit: each formula is instantiated
many times with different combinations of
constants. For example, suppose we are
modeling a social network in which each
pair of people is either friends or not.
Before introducing any information
about the individuals, the probability
that any two people are friends must be
the same as any other pair. Lifted inference exploits these symmetries to reason
efficiently, even on very large domains. 29
Probabilistic theorem proving (PTP) 8
applies this idea to perform lifted
weighted model counting, so that many
equivalent groundings of the same formula can be counted at once without
instantiating them. As in the propositional case, lifted WMC can also be performed by compiling the first-order
knowledge base to a (lifted) arithmetic
circuit for repeated querying. 5
In some cases, lifted inference lets
us reason efficiently independent of
domain size, so that inferring probabilities over millions of constants and
trillions of ground formulas takes no
longer than reasoning over hundreds.
More often, evidence about individual
constants breaks symmetries, so that
different groundings are no longer
exchangeable. The efficiency gains
from lifting depend on both the structure of the knowledge base and the
structure of the evidence.
When there is not enough structure
and symmetry to perform inference
exactly, we can replace some of the recursive conditioning steps in PTP with
sampling. 8 This leads to an approximate
lifted inference algorithm, where sampling is used to estimate the weighted
count of some of the subformulas.
Learning
Here, we discuss methods for automatically learning weights, refining formulas, and constructing new formulas
from data.
Weight learning. In generative
learning, the goal is to learn a joint
probability distribution over all
atoms. A standard approach is to
maximize the likelihood of the data