cannot overlap too much. As shown in Schilling et al.
the two distributions to be sufficiently far apart, the distance
between the means of the two distributions needs to exceed
2σ. This, however, assumes that the two distributions have
the same variance. More formally, if the two subdistribu-
tions do not have the same variance, then for their sum to be
bimodal, the following must hold26:
92 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1
of CS grades put forth by Lister is that the grades are not, in
15 Lister observed that CS grade distributions
are generally noisy, and in line with what statisticians would
accept as normally distributed. Lister argued that the perception of bimodal grades results from instructors’ beliefs in the
Geek Gene Hypothesis, and hence, instructors see bimodality where there is none.
15 Lister’s argument was theoretical,
and based on statistical theory; in this paper, we test his argument by statistically analyzing actual grade distributions.
2. 1. Histograms can deceive
2. WHAT IS A BIMODAL DISTRIBUTION?
To properly tackle the question of “are CS grades bimodal?”,
we should first clearly establish what bimodality means. For
a comprehensive discussion of this, we suggest the reader
consult25; we summarize some major points of that article in
Consider this histogram of sepal widths for the Iris species
versicolor, taken from the Wikipedia page on “normal
The data has two peaks, but the data is considered to be
Most standard continuous probability distributions have
a mean, a median, a mode, and some measure of the distri-
bution’s width (variance). Standard distributions include the
normal (Gaussian), Pareto, Poisson, Cauchy, Student’s t, and
logistic distributions. When we plot them (or likely, a sample
thereof) with a histogram, we see their probability density. All of
these distributions have a single mode, and have a probabil-
ity density that can be modeled with a function that has a
single term. For example, the normal distribution’s PDF is
In this function, a represents the height of the curve’s peak,
b is the position of the center of the peak, and c represents
the width of the curve.
sampled from a normal distribution. If we were to try and
model this data as the mixture of two normal distributions,
the two subdistributions would be too close together to produce two distinct peaks. The simplest way to model this data
is as a normal distribution, especially as this is consistent
with biological theory.
In contrast, a bimodal distribution has two distinct
modes. A ‘multimodal’ distribution is any distribution with
multiple distinct modes (two or more). For example, con-
sider these examples from.
25 Both are created by the equal
mixture of two triangular distributions (solid lines). The
sums are shown with dashed lines:
As we can see, when the two subdistributions are far away
Remember that what we see in a histogram is a result of
how we select the bins. It is possible to bin this data in a way
that does not have two ‘peaks’ (for example, by using larger
bin intervals, or shifting the bin boundaries). With grade
distributions, ceiling effects are common: if you take nor-
mally distributed data, and then lower the values above
100% down to 100%, you may wind up seeing a second “peak”
in your histogram’s top bin. For an illustration, see distribu-
tion 6 in Figure 1.
3. STUDY 1: GRADES ANALYSIS
(example a), we get a distribution with two peaks. But when
the two subdistributions are close together (example b), they
add together to form a plateau, with a single peak. Example
a is considered bimodal; example b is not.
The same is true for normal distributions (also from
Schilling et al.
For a distribution to be bimodal, the subdistributions
Are CS grades bimodal, or unimodal? To test this, we
acquired the final grade distributions for every undergradu-
ate CS class at the University of British Columbia (UBC),
from 1996 to 2013. This represents 778 different lecture sec-
tions, containing 30,214 final grades (average class size: 75).
We analyzed this data to see what distribution(s) it may have
most likely come from. Frequentist null-hypothesis testing
is the standard in computer science education research; for
readers who are unfamiliar interpreting p values from null-
hypothesis tests, we recommend consulting Goodman.
3. 1. Testing for multimodality
We began by computing the kurtosis for each class. Kurtosis
is a measure of how ‘tailed’ the data is: high kurtosis means
a distribution has a sharp peak and short tails, whereas low
kurtosis implies low peak(s) and long tails.
If you look back at the illustration of adding two normal