change to be cherry-picked cleanly but
to fail later. Many dependencies are
simply textual: someone went through
and changed every instance of the word
banana to orange in the destination
branch, and a cherry-picked change
that refers to bananas can no longer be
applied cleanly.
The usual approach when cherry-picking fails because of a textual problem (sadly, a common occurrence) is to
inspect the change by eye and reenter
it by hand in a text editor. Distributed
revision-control systems have come up
with some powerful techniques to handle this type of problem.
Perhaps the most powerful approach is that taken by Darcs, a distributed revision-control system that
is truly revolutionary in how it looks at
changes. Instead of a simple chain or
graph of changes, Darcs has a much
more powerful theory of how changes
depend on each other. This allows it to
be enormously more successful at cherry-picking changes than any other distributed revision-control system. Why
isn’t everyone using Darcs, then? For
years, it had severe performance problems that made it completely impractical. These have been addressed, to the
point where it is now merely quite slow.
Its more fundamental problem is that
its theory is tricky to grasp, so two developers who are not immersed in Darcs
lore can have trouble telling whether
they have the same changes or not.
Let us return to the fold of Mercurial and Git. Since these tools offer the
ability to make a commit on top of any
revision, thereby spawning a tiny anonymous branch, a viable alternative to
cherry-picking is as follows: use bisect
to identify the revision where a bug
arose; check out that revision; fix the
bug; and commit the fix as a child of the
revision that introduced the bug. This
new change can easily be merged into
any branch that had the original bug,
without any sketchy cherry-picking antics required. It uses a revision-control
tool’s normal merge and conflict-reso-lution machinery, so it is far more reliable than cherry-picking (the implementation of which is almost always a
series of grotesque hacks).
This technique of going back in history to fix a bug, then merging the fix
into modern branches, was given the
name “daggy fixes” by the authors of
Monotone, an influential distributed
revision-control system. The fixes are
called daggy because they take advantage of a project’s history being structured as a directed acyclic graph, or
dag. While this approach could be used
with Subversion, its branches are heavyweight compared with the distributed
tools, making the daggy-fix method less
practical. This underlines the idea that
a tool’s strengths will inform the techniques that its users bring to bear.
Strengths of centralized Tools
One area where the distributed tools
have trouble matching their centralized
competitors is with the management of
binary files, large ones in particular. Although many software disciplines have
a policy of never putting binary files
under the management of a revision-control system, doing so is important
in some fields, such as game development and EDA (electronic design automation). For example, it is common for
a single game project to version tens of
gigabytes of textures, skeletons, animations, and sounds. Binary files differ
from text files in usually being difficult
to compress and impossible to merge.
Each of these brings its own challenges.
If a moderately large binary file is
stored under revision control and modified many times, the space needed to
store each revision can quickly become
greater than the space required for all
text files combined. In a centralized
system, this overhead is paid only once,
on the central server. With a distributed system, each repository on every
laptop will have a complete copy of that
file’s history. This can both ruin performance and impose an unacceptable
storage cost.
When two people modify a binary
file, for most file formats there is no way
to tell what the differences are between
their versions of the file, and it is even
rarer for software to help with resolving
conflicts between their respective modifications. As a way of avoiding merging
binary files, centralized systems offer
the ability to lock files, so that only one
person can edit a file in a given branch
at any time. Distributed systems cannot
by their nature offer locking, so they
must rely on social norms (for example,
a team policy of only one person ever
modifying certain kinds of files).
Relative to its distributed counter-
choosing a revision-
control system
is a question with
a surprisingly
small number of
absolute answers.
The fundamental
issues to consider
are what kind of
data your team
works with, and
how you want
your team members
to interact.