degradation when overflow occurs, and proposals for managing overflows (for example, signatures17) incur false positives that add complexity to the programming model. Therefore, from an industrial perspective, HTM designs have to provide more benefits for the cost on a more diverse set of workloads (with varying transactional characteristics) for hardware designers to consider implementation. (Reuse of hardware for other purposes can also justify its inclusion, as may be the case for Sun’s implementation of Scout Threading in the Rock processor. 18)
• Hybrid systems19, 20, 21, 22 are the most likely platform for the eventual adoption of TM by a wide audience, although the exact mix of hardware and software support remains unclear. A special case of the hybrid system is the hardware-accelerated STM. In this scenario, the transactional semantics are provided by STM, and hardware primitives are used only to speed up critical performance bottlenecks in the STM system. Such systems could offer an attractive solution if the cost of hardware primitives is modest and may be further amortized by other uses.
Independent of these implementation decisions, there are transactional semantics issues that break the ideal transactional programming model for which the community had hoped. TM introduces a variety of programming issues that are not present in lock-based mutual exclusion. For example, semantics are muddled by: • Interaction with nontransactional codes, including access to shared data from outside of a transaction (tolerating weak atomicity) and the use of locks inside a transaction (breaking isolation to make locking operations visible outside transactions). • Exceptions and serializability—how to handle exceptions and propagate consistent exception information from within a transactional context, and how to guarantee that transactional execution respects a correct ordering of operations.
• Interaction with code that cannot be transactionalized, as a result of either communication with other threads or a requirement barring speculation.
• Livelock, or the system guarantee that all transactions make progress even in the presence of conflicts.
In addition to the intrinsic semantic issues, there are also implementation-specific optimizations motivated by high transactional overheads, such as programmer annotations for excluding private data. Furthermore, the nondeterminism introduced by aborting transactions complicates debugging—transactional code may be executed and aborted on conflicts, which makes it difficult for the programmer to find deterministic paths with repeatable behavior. Both of these dilute the productivity argument for transactions, especially software-only TM implementations.
Given all these issues, we conclude that TM has not yet matured to the point where it presents a compelling value proposition that will trigger its widespread adoption. While TM can be a useful tool in the parallel programmer’s portfolio, it is not going to solve the parallel programming dilemma by itself. There is evidence that it helps with building certain concurrent data structures, such as hash tables and binary trees. In addition, there are anecdotal claims that it helps with workloads; however, despite several years of active research and publication in the area, we are disappointed to find no mentions in the research literature of large-scale applications that make use of TM. The STAMP23 (Stanford Transactional Applications for Multiprocessing) and Lonestar24 benchmark suites are promising starts but have a long way to go to be representative of full applications.
We base these conclusions on our work over the past two years building a state-of-the-art STM runtime system and compiler framework, the freely available IBM STM. 25 Here, we describe this experience, starting with a discussion of STM algorithms and design decisions. We then compare the performance of this STM with two other state-of-the-art implementations (the Intel STM26 and the Sun TL2 STM27), as well as dissect the operations executed by the IBM STM and provide a detailed analysis of the performance hotspots of the STM.
SOF T WARE TRANSACTIONAL MEMORY STM implements all the transactional semantics in software. That includes conflict detection, guaranteeing the consistency of transactional reads, preservation of atomicity and isolation (preventing other threads from observing speculative writes before the transaction succeeds), and conflict resolution (transaction arbitration).
References:
Archives