tion of ACM conferences participants
in terms of ethnic origin and sex
as determined by Genderyzer. This
technique allowed us to formulate
95% confidence intervals for the sex
composition of the names the soft-
ware identified as “ambiguous” and
“unknown.”f
The following example clarifies
f To be conservative, we used the lower bound of
the confidence interval for our analysis.
the method we used for determining
the gender composition in a sample
of names. Assume we have a population of names with a known 2:2: 1
American-Asian-international ratio
and 3: 1 male-female ratio out of a
random sample of 100 names. Based
on our testing, we know that Genderyzer on average correctly identifies sex
for 70% of the names and places the
rest in one of three categories: “
unknown,” “ambiguous,” and “initials”
figure 3. Women percentage of all authors, women percentage of Ph.D.s, and women
percentage of cumulative Ph.D.s.
female authors
female cumulative Ph.d. graduates
female Ph.d. graduates
30%
25%
20%
15%
10%
5%
(rather than full names). We repeatedly drew 50 random samples of size 100
from the combined data sets to calculate the average number and 95% confidence intervals for the number of female names in the “ambiguous” and
“unknown” categories. Since we had a
priori knowledge of sex and country-of-origin variables, we could identify
that on average in such a sample 40%
and 45% of the names in “unknown”
and “ambiguous” categories, respectively, belonged to females. We used
the results from our random trials to
make well-supported assumptions
about the distribution in the “
ambiguous” and “unknown” categories
to augment the percentage of identifiable names.
We had no knowledge of the actual distribution of “initials” so assumed that names in that category
had the same gender composition as
the weighted average of our four other
categories in the same year. It is possible that this assumption could lead
to underestimation of women’s representation if women use initials more
than men to avoid possible gender
bias. We chose to be conservative in
our estimation.
0%
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2005
2003
2007
figure 4. number of papers published by individual men, individual women, and
co-authored by multiple authors.
men
co-authored
Women
12,000
10,000
8,000
6,000
4,000
2,000
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
0
1967
Papers and authors
increased exponentially
The annual number of conference papers published by ACM as represented in our data set grew from 149 in
1966 to 12,222 in 2008. This increase
is hardly surprising, given the phenomenal growth and differentiation
of computing, as well as the general
growth in academic publishing.
13 The
number of authors grew even more
dramatically—from 389 to 37,944
during the same period. This difference in growth rates of papers and
authors is explained by the increasing
prevalence of collaborative authorship. In 1966, papers had on average
2. 6 authors, but, by 2008, papers had
on average 3. 1 authors. Over those 43
years, most authors were men, though
women authors were increasingly
prevalent in recent years. In 2008,
there were approximately 2. 3 male authors and 0.8 woman authors per published ACM conference paper.
Women’s authorship increased as
they garnered Ph.D. degrees. Figure
2 reflects the substantial increases in