students in a lab setting with custom-designed language and
IDE. Our study, by contrast is a field study of popular software
applications. While we can only indirectly (and post facto)
control for confounding factors using regression, we benefit
from much larger sample sizes, and more realistic, widely-used software. We find that statically typed languages in general are less defect-prone than the dynamic types, and that
disallowing implicit type conversion is better than allowing
it, in the same regard. The effect sizes are modest; it could
be reasonably argued that they are visible here precisely
because of the large sample sizes.
Harrison et al.
8 compared C++, a procedural language,
with SML, a functional language, finding no significant difference in total number of errors, although SML has higher
defect density than C++. SML was not represented in our data,
which however, suggest that functional languages are generally less defect-prone than procedural languages. Another line
of work primarily focuses on comparing development effort
across different languages.
12, 20 However, they do not analyze
language defect proneness.
( 2) Surveys. Meyerovich and Rabkin16 survey developers’
views of programming languages, to study why some languages
are more popular than others. They report strong influence
from non-linguistic factors: prior language skills, availability of open source tools, and existing legacy systems. Our
study also confirms that the availability of external tools also
impacts software quality; for example, concurrency bugs in
Go (see RQ4 in Section 3).
( 3) Repository mining. Bhattacharya and Neamtiu1 study
four projects developed in both C and C++ and find that the
software components developed in C++ are in general more
reliable than C. We find that both C and C++ are more defect-prone than average. However, for certain bug types like concurrency errors, C is more defect-prone than C++ (see RQ4
in Section 3).
5. THREATS TO VALIDITY
We recognize few threats to our reported results. First, to identify bug fix commits we rely on the keywords that developers
often use to indicate a bug fix. Our choice was deliberate.
We wanted to capture the issues that developers continuously face in an ongoing development process, rather than
reported bugs. However, this choice possesses threats of over
estimation. Our categorization of domains is subject to interpreter bias, although another member of our group verified
the categories. Also, our effort to categorize bug fix commits
could potentially be tainted by the initial choice of keywords.
The descriptiveness of commit logs vary across projects. To
mitigate these threats, we evaluate our classification against
manual annotation as discussed in Section 2. 4.
We determine the language of a file based on its extension. This can be error-prone if a file written in a different
language takes a common language extension that we have
studied. To reduce such error, we manually verified language
categorization against a randomly sampled file set.
To interpret language class in Section 2. 2, we make cer-
tain assumptions based on how a language property is most
commonly used, as reflected in our data set, for example, we
classify Objective-C as unmanaged memory type rather
in dynamic languages like Ruby and Php have fewer concur-
CoffeeScript, and TypeScript do not support concur-
rency, in its traditional form, while Php has a limited support
depending on its implementations. These languages intro-
duce artificial zeros in the data, and thus the concurrency
model coefficients in Table 8 for those languages cannot be
interpreted like the other coefficients. Due to these artificial
zeros, the average over all languages in this model is smaller,
which may affect the sizes of the coefficients, since they are
given w.r.t. the average, but it will not affect their relative rela-
tionships, which is what we are after.
A textual analysis based on word-frequency of the bug fix
messages suggests that most of the concurrency errors occur
due to a race condition, deadlock, or incorrect synchronization, as shown in the table above. Across all language, race
conditions are the most frequent cause of such errors, for
example, 92% in Go. The enrichment of race condition errors
in Go is probably due to an accompanying race-detection tool
that may help developers locate races. The synchronization
errors are primarily related to message passing interface
(MPI) or shared memory operation (SHM). Erlang and Go
use MPIe for inter-thread communication, which explains
why these two languages do not have any SHM related errors
such as locking, mutex, etc. In contrast, projects in the other
languages use SHM primitives for communication and can
thus may have locking-related errors.
Security and other impact errors. Around 7.33% of all the
bug fix commits are related to Impact errors. Among them
Erlang, C++, and Python associate with more security
errors than average (Table 8). Clojure projects associate
with fewer security errors (Figure 2). From the heat map we
also see that Static languages are in general more prone
to failure and performance errors, these are followed by
such as Erlang. The analysis of deviance results confirm
that language is strongly associated with failure impacts.
While security errors are the weakest among the categories, the deviance explained by language is still quite strong
when compared with the residual deviance.
Result 4: Defect types are strongly associated with languages;
some defect type like memory errors and concurrency errors
also depend on language primitives. Language matters more
for specific categories than it does for defects overall.
4. RELATED WORK
Prior work on programming language comparison falls in
( 1) Controlled experiment. For a given task, developers
are monitored while programming in different languages.
Researchers then compare outcomes such as development
effort and program quality. Hanenberg7 compared static ver-
sus dynamic typing by monitoring 48 programmers for 27 h
while developing a parser program. He found no significant
difference in code quality between the two; however, dynamic
type-based languages were found to have shorter develop-
ment time. Their study was conducted with undergraduate
e MPI does not require locking of shared resources.