(<= A4 A5)
(<= A6 A7)
Having the labels in chronological
order, we can constrain the age of the
person in the middle to be the median:
(= A4 30)
Sugar has an “if” function that allows encoding constraints for a subset
of the population. Recall that statistic 2B
contains three constraints: there are
three males, their median age is 30, and
their average age is 44. The value 0 represents a female, and 1 represents a male:
#define FEMALE 0
#define MALE 1
Using the variable Sn to represent
the sex of person n, we then have the
constraint:
S1 + S2 + S3 + S4 + S5 + S6 + S7 =
3
This can be represented as:
(= (+ S1 S2 S3 S4 S5 S6 S7)
3)
As illustrated in Figure 1, the if
function allows a straightforward way
to create a constraint for the mean age
44 of male persons.
Table 1 translates into 164 individual
s-expressions extending over 457 lines.
Sugar then translates this into a single
Boolean formula consisting of 6,755
variables arranged in 252,575 clauses.
This format is called the CNF (
conjunctive normal form) because it consists of
many clauses that are combined using
the Boolean AND operation.
Interestingly, we can even create
constraints for the suppressed data.
Statistic 3A is suppressed, so we know
SAT solvers can process only those sys-
tems that have Boolean variables, so Sug-
ar transforms the s-expressions into a
much larger set of Boolean constraints.
For example, each age variable is encod-
ed using unary notation as 126 Boolean
variables. Using this notation, the deci-
mal value 0 is encoded as 126 false Bool-
ean variables, the decimal value 1 is en-
coded as 1 true and 125 false values, and
so on. Although this conversion is not
space efficient, it is fast, provided that
the integers have a limited range.
To encode the median age con-
straint, the median of a group of num-
bers is precisely defined as the value of
the middle number when the numbers
are arranged in sorted order (for the
case in which there is an odd number of
numbers). Until now, persons 1 through
7 have not been distinguished in any
way: the number labels are purely arbi-
trary. To make it easier to describe the
median constraints, we can assert the
labels must be assigned in order of age.
This is done by introducing five con-
straints, which has the side effect of
eliminating duplicate answers that have
simply swapped records, an approach
called breaking symmetry.
12
(<= A1 A2)
(<= A2 A3)
(<= A3 A4)
Figure 1. Encoding statistic 2B, that the average male age is 44, with Sugar’s “if”
statement.
(= (+ (if (= S1 MALE) A1 0) ; average male age = 44
(if (= S2 MALE) A2 0)
(if (= S3 MALE) A3 0)
(if (= S4 MALE) A4 0)
(if (= S5 MALE) A5 0)
(if (= S6 MALE) A6 0)
(if (= S7 MALE) A7 0)
)
(* 3 44))
Figure 2. Encoding the suppressed statistic 3A, that there are between 0 and 2 single
adults.
Let Mn represent the marital status of person n:
#define SINGLE 0
#define MARRIED 1
(int SINGLE_ADULT_COUNT 0 2)
(= (+ (if (and (= M1 SINGLE) (> A1 17)) 1 0)
(if (and (= M2 SINGLE) (> A2 17)) 1 0)
(if (and (= M3 SINGLE) (> A3 17)) 1 0)
(if (and (= M4 SINGLE) (> A4 17)) 1 0)
(if (and (= M5 SINGLE) (> A5 17)) 1 0)
(if (and (= M6 SINGLE) (> A6 17)) 1 0)
(if (and (= M7 SINGLE) (> A7 17)) 1 0))
SINGLE_ADULT_COUNT)
(>= SINGLE_ADULT_COUNT 0)
(<= SINGLE_ADULT_COUNT 2)
Table 4. A single satisfying assignment.
Age Sex Race Marital Status Solution #1
8 F B S 8FBS
18 M W S 18MWS
24 F W S 24FWS
30 M W M 30MWM
36 F B M 36FBM
66 F B M 66FBM
84 M B M 84MBM
Table 5. Solutions without statistic 4A.
Solution #1 Solution #2
8FBS 2FBS
18MWS 12MWS
24FWS 24FWM
30MWM 30MBM
36FBM 36FWS
66FBM 72FBM
84MBM 90MBM