such that:
1. E[F] » E[ f ], i.e., f approximates F well on average;
2. E [F] » E [ f ], i.e., f approximates F well on average, even
mm
when restrict ourselves to inputs drawn according to m.
Then use (1) to conclude that
with F almost everywhere. It turns out that an “error” func-
tion E (x) that “flags” the problematic locations, i.e., E (x) = 1
whenever f (x) ≠ F(x), can be computed by (a slightly big-
ger) AC0 circuit. Thus, it the function f *(x) = (1 − E (x) ) · f (x)
were a polynomial, we would be in great shape: f *(x) = F(x)
most of the time, while even when they disagree, the dif-
ference | f *(x) − F(x)| £ 1, and thus in fact E[ f *] » E[F] and
E [ f *] » E [F].
mm
E [F] » E [ f ] = E[ f ] » E[F],
mm
i.e., that m ε-fools F. Of course, it is not clear at all that such
a polynomial must exist (and we do not know whether it
does). However, a carefully crafted attack along these lines
actually does go through. The proof combines two previous constructions of approximating polynomials for AC0
circuits.
Of course, f is not a polynomial, since E is not a polynomial. However, it turns out with a little bit of work that the
polynomial f ¢ = (1 −E ) · f, where E is the Linial–Mansour–Nisan
˜˜
approximation of E actually allows us to prove the main
theorem. Thus the proof comes out of combining the two
approximation techniques in one polynomial.
The first construction is due to Linial, Mansour, and
Nisan8 and is discussed extensively in Section 3. It is an
analytic construction that for each AC0 function F provides
˜
a low-degree polynomial f that approximates it well on
average. Such a polynomial would easily fulfill require-
ment 1 above. However, it would not be helpful at fulfilling
requirement 2. Since the support of m is quite small, being
close on average does not rule out the possibility that F and
˜
f deviate wildly from each other on the points of the sup-
port of the distribution m. This type of uniform average-
case approximation is inherently useless when we want F
˜
to be close to f on samples drawn according to m.
3. Lo W-DePth ciRcuits anD PoLYnomiaLs—
the anaLYtic connection
Approximately a decade after the first AC0 lower bounds
were published, Linial, Mansour, and Nisan proved the fol-
lowing remarkable lemma:
Lemma 2. If F: {0, 1}n → {0, 1}
is a boolean function com-putable by a depth-d circuit of size m, then for every t there is a
degree t polynomial f ˜with8
The second construction is a combinatorial one, and is
due to Razborov11 and Smolensky.13 It is discussed in Section
4. For an AC0 function F, and any distribution of inputs ν, the
Razborov–Smolensky construction gives a low-degree polynomial f such that F = f with high probability with respect
to ν. In other words,
The lemma states that any low-depth circuit of size m
can be approximated very well by a polynomial of degree
(log m)O(d)—a degree that is poly-logarithmic in the size of
the circuit. The approximation error here is the average-
˜
square error: f doesn’t have to agree with F on any point,
but the average value of |F(x) − f (x)|2 when taken over the
Hamming cube is small. An illustration of the statement of
the lemma can be found in Figure 1.
ν [{x : f (x) ≠ F (x)}] < ε.
Note that here f is actually equal to F most of the time (and is
not just close to it). Since ν is allowed to be any distribution,
we can take
to be the hybrid between m and the uniform distribution,
thus guaranteeing that the probability that f (x) ≠ F(x) is
small simultaneously with respect to both distributions.
This seems to be a promising improvement over the Linial–
Mansour–Nisan polynomials, that may allow us to fulfill
both requirements 1 and 2 above.
To understand the context of Lemma 2 we need to consider one of the most fundamental tools in the analysis
of boolean functions: the Fourier transform, which we
will briefly introduce here. An introductory survey on the
Fourier transform over the boolean cube can be found in de
Wolf.4 For the remainder of this section we will represent
the boolean function F: {0, 1}n → {0,1} using a function
G: {−1, +1}n → {−1, +1} with 0 corresponding to −1 and 1 to +1.
Thus
G(x , . . . , x ) := 2F ((x + 1)/2, . . . , (x + 1)/2) − 1.
1n 1 n
This will not affect any of the results, but will make all the
The caveat here is that on the few locations where f (x) ≠
F(x) the deviation | f (x) − F(x)| may be huge. In particular,
while f is close to F (more precisely, it is perfectly equal) almost
everywhere, it is not necessarily the case that they are close on
average. Thus, for example, E[ f ] may differ wildly from E[F].
figure 1. an illustration of the statement of Lemma 2. for
convenience, the boolean cube is represented by the real line.
the function F (in gray) is an ac0 boolean function. the function f
˜
(in black) is a real-valued low-degree polynomial approximating
F well on average.
We see that both polynomial approximations get
˜
1
us “part-way” toward the goals, but do not quite work.
To make the proof work we need to combine the two
approaches as described in Section 5. In a nutshell, we
first take a Razborov–Smolensky polynomial f that agrees
f
F
0