how to invert a probabilistic function of type ℙ(B|A) = A→ℙ(B)into a
probabilistic function of type ℙ(A|B)
= B→ℙ(A) using conditioning.
When function inversion is applied
to the running example, probabilistic
∈ ℙ(Weight|Food) can be defined as
=(Doctor*CDC) ÷ ( _ = food)
Removing all syntactic sugar and using the value/probability pairs implementation amounts to the following
= [obese 36, skinny 18]
= [obese 4, skinny 42]
In practice, most monads have an
unsafe run function of type ℙ(A)→M(A)
that teleports you out of the monad into
some concrete container M. Mathematicians call this the forgetful functor. For
distributions dist ∈ ℙ(A), a common
way to exit the monad is by picking the
value a ∈ A with the highest probability
in dist. Mathematicians use the higher-order function arg max for this, and
call it MLE (maximum likelihood estimator) or MAP (maximum a posteriori).
In practice it is often more convenient
to return the pair ap from dist with
the highest probability.
A simple way to find the value with
the maximal likelihood from a frequentist representation of a distribution
is to blow up the source distribution
ℙ(A) into a distribution of distributions ℙ(ℙ(A)), where the outer distribution is an infinite frequentist list of
inner Bayesian distributions [Aℝ],
computed by grouping and summing,
that over time will converge to true underlying distribution. Then you can select the nth inner distribution and take
its maximum value.
WeightFromFood ∈ Food →[Aℝ]
= ℙredict WeightFromFood(food).
rence of values in the result will automatically reflect the proper product of
values from the prior and likelihood.
For example, if we implement the prior
CDC by an infinite collection with odds
obese:skinny = 4: 6, and the result
of Doctor(skinny) by an infinite collection with odds burger:celery
= 3: 7, and, respectively, that of
Doctor(obese) by a collection with
odds burger:celery = 9: 1, then
sampling from the infinite collection
Doctor*CDC, which results from applying the prior to the likelihood, will
have a ratio:
(skinnny:celery) = 36:4:18: 24.
The keen reader will note that (*)
is a slight variation of the well-known
monadic bind operator, which, depending on your favorite programming language, is known under
the names (>>=), SelectMany, or
flatMap. Indeed, probability distributions form a monad. Mathematicians call it the Giry monad, but
Reverend Bayes beat everyone to it by
nearly two centuries.
Note that as formulated, Bayes’
rule has a type error that went unnoticed for centuries. The left-hand
side returns a distribution of pairs
ℙ(A&B), while the right-hand side returns a distribution of pairs ℙ(B&A).
Not a big deal for mathematicians
since & is commutative. For brevity we’ll be sloppy about this as well.
Since we often want to convert from
ℙ(A&B) to ℙ(A) or ℙ(B) by dropping
one side of the pair, we prefer the C#-variant of SelectMany that takes a
combiner function A⊕B∈C to post-process the pair of samples from the
prior and likelihood:
from ap in prior
from bq in likelihood(a)
Now that we know that (*) is monadic bind, we can start using syntactic
sugar such as LINQ queries or for/
monad comprehensions. All that is
really saying is that it is safe to drop
the explicit tracking of probabilities
from any query written over distributions (that is, the code on the left in
Figure 4 is simply sugar for the code
on the right, which itself can be alternatively implemented with the frequentist approach using sampling).
Another way of saying this is that we
can use query comprehensions as a DSL
(domain-specific language) for specifying
probabilistic functions. This opens the
road to explore other standard query operators besides application that can work
over distributions and that can be added
to our repertoire. The first one that comes
to mind is filtering, or projection as the
mathematicians prefer to call it.
Given a predicate (A→B), we can
drop all values in a distribution for
which the predicate does not hold using the division operator (÷):
ℙ (A)÷(A→B) ∈ ℙ(A)
prior ÷ condition = from a in prior
where condition(a)select a
The traditional division of a distribution ℙ(A&B) by distribution ℙ(B)
can be defined similarly as
joint ÷ evidence =
λ b.from (a,b) in joint from
b' in evidence where b=b'
We can show that (f*d)÷d = f. Applying the latter version to Bayes’ rule results in the following equivalence:
ℙ(A|B) = ℙ(B|A)* ℙ(A) ÷ ℙ(B)
In practice, it is most convenient to
use query comprehensions directly instead of operators, and write code like
ℙosterior ∈ ℙ(C|B)=B→ ℙ(C)
from a in prior
from b' in likelihood(a) where
b = b'
Whichever way you spin it, this
is incredibly cool! Bayes’ rule shows
Figure 4. Syntactic sugar.
from a in prior from a→p in prior
from b in likelihood (a) from b→q in likelihood (a)
select a⊕b select a⊕b→p*q