Recording transactions. In the explanation above we assumed that the exact location of each memory location used in the transaction is recorded. This might be inefficient, though, especially with HTM support. Recording information for every memory location would mean having an overhead of several words for each memory location used. As with CPU caches, which theoretically could also cache individual words, this often constitutes too high of a price. Instead, CPU caches today handle cache lines of 64 bytes at once. This would mean for a TM implementation that step 2 in our description would not have to record an additional dependency in case the memory location is in a block that is already recorded.
But this introduces problems as well. Assume that in the final example code all four variables are in the same block. This means that our assumption of f1 _ 1 and f2 _ 2 being independently executable is wrong. This type of block sharing leads to high abort rates. It is related to the problem of false sharing, which in this case also happens and therefore should be corrected anyway.
These false aborts, as we might want to call them, are not only a performance issue, though. Unfortunate placement of variables actually might lead to problems never making any progress at all because they are constantly inadvertently aborting each other. This can happen because of several different transactions going on concurrently that happen to touch the cache memory blocks but different addresses. If blocking is used, this is a problem that must be solved.
illUstration by oli larUelle
Handling aborts. Another detail described earlier is the way aborts are handled. What has been described is the so-called lazy abort/lazy commit method (lazy/lazy for short). Transactions continue to work even if they are already aborted and the results of the transaction are written into the real memory location only when the entire transaction succeeded.
This is not the only possibility, though. Another one is the exact opposite: the eager/eager method. In this case transactions will be recognized as aborted as early as possible and restarted if necessary. The effect of store
instructions will also immediately take effect. In this case the old value of the memory location has to be stored in memory local to the transaction so that, in case the transaction has to be aborted, the previous content can be restored.
There are plenty of other ways to handle the details. It might turn out that no one way is sufficient. Much will depend on the abort rate for the individual transaction. It could very well be that compilers and TM runtimes will implement multiple different ways at the same time and flip between them for individual transactions if this seems to offer an advantage.
Semantics. The semantics of the tm _ atomic block (or whatever it will be in the end) have to be specified. It is necessary to integrate TM into the rest of the language semantics. For example, TM must be integrated with exception handling for C++. Other issues are the handling of nested TM regions and the treatment of local variables (they need not be part of the transaction but still have to be reset on abort).
Performance. Performance is also a major issue. Plenty of optimizations can and should be performed by the compiler, and all this needs research. There are also practical problems. If the same program code is used in contested and uncontested situations (for example, in a single-threaded program), the overhead introduced through TM is too high. It therefore might be necessary to generate two versions of each function: one with TM support and the other without. The TM runtime then has to make sure that the version without TM support is used as frequently as possible. Failure to the one side means loss of performance, failure to the other side means the program will not run correctly.
TM promises to make parallel programming much easier. The concept of transaction is already present in many programs (from business programs to dynamic Web applications) and it proved to be reasonably easy to grasp for programmers. We can see first implementations coming out now, but all are far from ready for prime time. Much research remains to be done.
References
1. drepper, U. What every programmer should know about memory (2007); http://people.redhat.com/ drepper/cpumemory.pdf.
2. herlihy, M., Moss, J.e.b. transactional memory: architectural support for lock-free data structures. in Proceedings of 20th International Symposium on Computer Architecture (1993); http://citeseer.ist.psu. edu/ herlihy93transactional.html.
Ulrich Drepper ( drepper@redhat.com) is a consulting engineer at red hat, where he has worked for the past 12 years. he is interested in all kinds of low-level programming and has been involved with linux for almost 15 years. a previous version of this appeared in the september 2008 issue of ACM Queue.
feBRuaRY 2009 | vol. 52 | No. 2 | CommunICatIons of the aCm
43
References:
http://people.redhat.com/drepper/cpumemory.pdf
http://people.redhat.com/drepper/cpumemory.pdf
Archives