server infrastructure required.
Both the centralized and distributed
approaches to publication offer trade-offs. With a small, tightly knit team that
is always wired, commit-as-publish can
look like an easier choice. In a more
loosely organized setting—for example,
where team members travel or spend a
lot of time at customer sites—the decoupling of commit from publication
may be a better fit.
Centralized tools can be a good fit
for highly structured “rule the team
with an iron fist” models of management. Access can be controlled by managers, not peers. Whole sections of the
tree can be made writable or readable
only by employees with specific levels
of clearance. Decentralized systems
don’t currently offer much here other
than the ability to split sensitive data
into separate repositories, which is a
touch awkward.
The Pull model of Development
Many teams begin using a distributed
revision-control system in almost exactly the same way as the centralized system they are replacing. Everyone clones
one of a few central repositories and
pushes the changes back. This familiar
model works well for getting comfortable, but it barely scratches the surface
of the possible styles of interaction.
Since the distributed model emphasizes pulling changes into a local
repository, it naturally fits well with a
development model that favors code
review. Suppose that Alice manages the
repository that will become version 2. 4
of her team’s software project. Bob tells
her that he has some changes ready
to submit and gives her the URL from
which she can pull his changes. When
she reads through his changes, she notices that his code doesn’t handle error
conditions correctly, so she asks him to
revise his work before she will accept,
merge, and publish it.
Of course, a team may agree to use
a “review before merge” policy with a
centralized system, but the default behavior of the software is more permissive. Therefore, a team has to take explicit steps to constrain itself.
merges, Names, and
Software archaeology
Given their backgrounds, it is no surprise that Mercurial and Git have simi-
lar approaches to merging changes,
whereas Subversion does things differently.
Since merges occur so frequently
with Mercurial and Git, they have well-engineered capabilities in this realm.
The typical cases that trip up revision-control systems during merges are files
and directories that have been renamed
or deleted. Both Mercurial and Git handle renames cleanly.
Subversion’s merge machinery is
complicated and fragile. For example,
files that had been renamed used to
disappear in merges. This severe bug
has been partly addressed so that files
are now renamed, but they may contain
the wrong contents. It is not clear that
this is really a step forward.
A subtler problem with file naming
often hits cross-platform development
teams. Windows, OS X, and Unix systems
have different conventions for handling
the case of file names (such as, different
answers to the question of whether FOO.
TXT is the same name as foo.txt). Mercurial outshines its competition here.
It can detect—and work safely with—a
case-insensitive file system that is being
used on an operating system that is by
default sensitive to case.
Often, a developer’s first response
to receiving a new bug report will be to
look through a project’s history to see
what has changed recently or to annotate the source files to see who modified them and when. These operations
are instantaneous with the distributed
tools, because all the data is stored on a
developer’s computer, but they can be
slow when run against a distant or congested Subversion server. Since humans
are impatient creatures, extra wait time
will reduce the frequency with which
these useful commands are run. This
is another way in which responsiveness
has a disproportionate effect on how
people use their software.
a Powerful New Way to find Bugs
Although a simple display of history is
useful, it would be far more interesting
to have a way of pinpointing the source
of a bug automatically. Git introduced a
technique to do so via the bisect command (which proved so useful, Mercurial acquired a bisect command
of its own). This technique is trivial to
learn: you use the bisect command on
a revision that you know did not have
the bug, and the revision that you know
does have the bug. It then checks out a
revision and asks you whether that revision contains the bug; it repeats this
until it identifies the revision where the
bug first arose.
This is appealing to developers in
part because it is easy to automate.
Write a tiny script that builds your
software and tests for the presence of
the bug; fire off a bisect; then come
back later and find out which revision
introduced the problem, with no further manual intervention required. The
other reason that bisect is appealing is that it operates in logarithmic
time. Tell it to search a range of 1,000
revisions, and it will ask only about 10
questions. Widen the search to 10,000
revisions, and the number of questions
increases to just 14.
It would be difficult to overemphasize the importance of bisect. Not
only does it completely change the way
that you find bugs, but if you routinely
drive it using scripts, you’ll have effectively developed regression tests on the
fly, for free. Save those tests and use
them!
The wily reader will observe that
searching the commit history is much
easier with Subversion than with the
distributed tools, since its history is
much more linear. The counterpoint
to this is that the bisect command
is built into the other tools, and hence
more readily available and amenable to
reliable automation.
Daggy fixes and cherry-Picking
Once you have found a bug in a piece
of software, merely fixing it is rarely
enough. Suppose that your bug is several years old, and there are three versions
of your software in the field that need
to be patched. Each version is likely to
have a “sustaining” branch where bug
fixes accumulate. The problem is that
although the idea of copying a fix from
one branch to another is simple, the
practice is not so straightforward.
Mercurial, Git, and Subversion all
have the ability to cherry-pick a change
from one branch and apply it to another branch. The trouble with cherry-picking is that it is very brittle. A change
doesn’t just float freely in space: it has
a context—dependencies on the code
that surrounds it. Some of these dependencies are semantic and will cause