Parallel
Programming
with
Transactional Memory
it possible to speed up the program (i.e., reduce the total
runtime). Amdahl’s law expresses this as:
_____ 1_____
(1-P) + P/S
Here P is the fraction of the program that can be parallelized, and S is the number of execution units.
S YNCHRONIZATION PROBLEMS
This is the theory. Making it a reality is another issue.
Simply writing a normal program by itself is a problem,
as can be seen in the relentless stream of bug fixes available for programs. Trying to split a program into multiple
pieces that can be executed in parallel adds a whole
dimension of additional problems:
• Unless the program consists of multiple independent
pieces from the onset and should have been written as
separate programs in the first place, the individual pieces
have to collaborate. This usually takes the form of sharing
data in memory or on secondary storage.
• Write access to shared data cannot happen in an uncontrolled fashion. Allowing a program to see an inconsistent, and hence unexpected, state must be avoided at all
times. This is a problem if the state is represented by the
content of multiple memory locations. Processors are not
able to modify an arbitrary number (in most cases not
even two) of independent memory locations atomically.
• To deal with multiple memory locations, “traditional”
parallel programming has had to resort to synchronization. With the help of mutex (mutual exclusion) directives, a program can ensure that it is alone in executing
an operation protected by the mutex object. If all read or
write accesses to the protected state are performed while
holding the mutex lock, it is guaranteed that the program
will never see an inconsistent state. Today’s programming environments (e.g., POSIX) allow for an arbitrary
number of mutexes to coexist, and there are special types
of mutexes that allow for multiple readers to gain access
concurrently. The latter is allowed since read accesses do
not change the state. These mechanisms allow for reasonable scalability if used correctly.
• Locking mutexes open a whole new can of worms,
though. Using a single program-wide mutex would in
most cases dramatically hurt program performance by
decreasing the portion of the program that can run in
parallel (P in the formula). Using more mutexes increases
not only P, but also the overhead associated with locking
and unlocking the mutexes. This is especially problematic if, as it should be, the critical regions are only lightly
contended. Dealing with multiple mutexes also means
the potential for deadlocks exists. Deadlocks happen
if overlapping mutexes are locked by multiple threads
in a different order. This is a mistake that happens all
too easily. Often the use of mutexes is hidden in library
functions and not immediately visible, complicating the
whole issue.
THE PROGRAMMER’S DILEMMA
The programmer is caught between two problems:
• Increasing the part of the program that can be executed
in parallel (P).
• Increasing the complexity of the program code and
therefore the potential for problems.
An incorrectly functioning program can run as fast as
you can make it run, but it will still be useless. Therefore,
the parallelization must go only so far as not to introduce
problems of the second kind. How much parallelism this
is depends on the experience and knowledge of the programmer. Over the years many projects have been developed that try to automatically catch problems related to
locking. None is succeeding in solving the problem for
programs of sizes that appear in the real world. Static
analysis is costly and complex. Dynamic analysis has to
depend on heuristics and on the quality of the test cases.
For complex projects it is not possible to convert
the whole project at once to allow for more parallelism.
Instead, programmers iteratively add ever more fine-grained locking. This can be a long process, and if the
testing of the intermediate steps isn’t thorough enough,
problems that are not caused by the most recently added
set of changes might pop up. Also, as experience has
shown, it is sometimes very hard to get rid of the big
locks. For an example, look at the BKL (big kernel lock)
discussions on the Linux kernel mailing list. The BKL was
introduced when Linux first gained SMP (symmetric
multiprocessing) support in the mid-90s, and we still
haven’t gotten rid of it in 2008.