parts, a centralized tool will make the
history of a branch appear more linear.
Whether this is a strength or a weakness
seems to be a matter of perspective. A
more linear history is easier to understand, and so requires less revision-control sophistication from developers.
On the other hand, a history containing
numerous small branches and merges
more accurately reflects the true history
of a project and makes it clearer which
project state a team member’s code
was based on when working. For teams
that prefer to keep project history tidy,
both Git and Mercurial offer rebase
commands that can turn the chaotic
history of a feature into a neater collection of logical changes, more suited
to an eventual merger into a project’s
main branch.
Centralized tools can offer policy
advantages that are more difficult to
achieve with distributed tools. For
example, it is possible to configure a
pre-commit script that will reject an
attempted commit if it introduces an
automated test-suite failure. With a distributed tool, this kind of check can be
put in place on a shared central server,
but that cannot protect developers from
sharing inadvertently broken changes
with each other horizontally, from one
laptop to another.
What Behaviors Does a
Distributed Tool change?
The availability of cheap local commits
makes the use of a rapid-fire style of development attractive with distributed
tools. Suppose Alice is partway through
a complicated change and decides
that she wants to speculatively refac-tor a piece of code. With a distributed
tool, she can commit her change as is,
without worrying too much whether
the project is in a sane state, and try her
speculative change. If that experiment
fails, she can revert it and continue on
her way, eventually using the rebase
command to eliminate some of the in-progress commits she made while she
figured out what she was doing.
While this style of development is
certainly possible with Subversion,
experience suggests that it is far more
common with the distributed tools.
My conjecture is that the privacy of a
branch on a developer’s laptop, coupled
with the instantaneous responsiveness
of the distributed tools, somehow com-
bine to encourage more aggressive and
pervasive use of revision control.
I have observed a similar effect with
merges. Because they are such bread-and-butter activities with distributed
tools, in many projects they occur far
more frequently than with their centralized counterparts. Although all
merges require effort and incur risk,
when branches merge more frequently,
the merges are smaller and less perilous. Ask any seasoned developer about
a long-delayed merge following a few
months of isolated work, and watch the
blood drain out of his or her face.
What the future offers
We are not by any means near the end
of the road in the evolution of revision-control systems. The field has received
only fitful attention from academia.
Much work could be done on its formal foundations, which could lead to
more powerful and safer ways for developers to work together. Alas, I know
of only one notable publication on the
topic in the past decade. 1 As a simple
example, when merging potentially
conflicting changes, almost everybody
uses either three-way merging, which
is decades old, or unpublished ad hoc
approaches in which there is little reason to be confident.
More practically, there are plenty of
advances to be made in the way that
distributed tools handle large projects
with deep histories, for which they are
a poor fit because of the volume of data
involved. For organizations that have
sensitive needs around assurance and
security, the centralized tools do somewhat better than the distributed ones,
but both could improve substantially.
conclusion
Choosing a revision-control system is
a question with a surprisingly small
number of absolute answers. The fundamental issues to consider are what
kind of data your team works with, and
how you want your team members to interact. If you have masses of frequently
edited binary data, a distributed revision-control system may simply not
suit your needs. If agility, innovation,
and remote work are important to you,
the distributed systems are far more
likely to suit your needs; a centralized
system may slow your team down in
comparison.
There are also many second-order
considerations. For example, firewall
management may be an issue: Mercurial and Subversion work well over
HTTP and with SSL (Secure Sockets
Layer), but Git is unusably slow over
HTTP. For security, Subversion offers access controls down to the level
of individual files, but Mercurial and
Git do not. For ease of learning and
use, Mercurial and Subversion have
simple command sets that resemble
each other (easing the transition from
one to the other), whereas Git exposes
a potentially overwhelming amount of
complexity. When it comes to integration with build tools, bug databases,
and the like, all three are easily script-able. Many software development tools
already support or have plug-ins for
one or more of these tools.
Given the demands of portability,
simplicity, and performance, I usually
choose Mercurial for new projects,
but a developer or team with different
needs or preferences could legitimately
choose any of them and be happy in the
long term. We are fortunate that it is
easy to interoperate among these three
systems, so experimentation with the
unknown is simple and risk-free.
acknowledgments
I would like to thank Bryan Cantrill,
Eric Kow, Ben Collins-Sussman, and
Brendan Cully for their feedback on
drafts of this article.
Related articles
on queue.acm.org
A Conversation with Steve Bourne, Eric
Allman, and Bryan Cantrill
http://queue.acm.org/detail.cfm?id=1454460
Distributed Development: Lessons Learned
Michael Turnlund
http://queue.acm.org/detail.cfm?id=966801
Kode Vicious Strikes Again
http://queue.acm.org/detail.cfm?id=1036484
References
1. Löh, A., swierstra, W., Leijen, D. A principled approach
to version control, 2007; http://people.cs.uu.nl/andres/
VersionControl.html.
Bryan O’Sullivan is an Irish hacker and writer based
in san Francisco. His interests include functional
programming, HPC, and building large distributed
systems. He is the author of the jolt Award-winning Real
World Haskell (2008) and Mercurial: The Definitive Guide
(2009), both published by o’Reilly.