C
O
L
L
A
G
E
B
Y
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
/
S
H
U
T
T
E
R
S
T
O
C
K
by a major cloud provider will predict
with 95% certainty that I have hair, but
is less confident about whether or not I
am professional (Figure 1 ).
The implication of incorporating
learned models in human-written
code is that you cannot get around
the fact that the building blocks
from which humans compose appli-
cations are fundamentally probabi-
listic. This is a challenge for main-
stream programming languages,
which all assume that computations
are precise and deterministic. Fortu-
nately, the 18th-century Presbyterian
minister Thomas Bayes anticipated
the need for dealing with uncertainty
and formulated Bayes’ rule: 6
ℙ(A|B)*ℙ(B) = ℙ(A&B) = ℙ(B|A)*ℙ(A)
As it turns out, Bayes’ rule is exactly what the doctor ordered when
it comes to bridging the gap between
ML and contemporary programming
languages.
Many of the mathematical explanations of Bayes’ rule are deeply
confusing for the working computer
scientist, but, remarkably, when interpreted from a functional programming point of view, Bayes’ rule is a
theorem about composability and invertibility of monadic functions. Let’s
break down Bayes’ rule piece by piece
and rebuild it slowly based on developer intuition.
Probability Distributions
First let’s explore what probability
distributions ℙ(A) are. The Wikipedia
definition, “a probability distribution
is a mathematical description of a random phenomenon in terms of the probabilities of events,” is rather confusing from a developer perspective. If
you click around for a bit, however, it
turns out that a discrete distribution
is just a generic list of pairs of values
and probabilities ℙ(A)=[A ℝ] such
that the probabilities add up to 1. This
is the Bayesian representation for distributions. Isomorphically, you can
use the frequentist representation of
distributions as infinite lists of type
dist ∈ [A], as n gets larger, sam-