Microsoft’s Experiments with Software Transactional Memory Have Ended
Dana Groff has announced the end of Microsoft’s experiment with software transactional memory for the .NET Framework. Known as STM.NET, this research project was announced in 2008 as an alternative to explicit locks when dealing with concurrency issues.
In theory software transactional memory works just like transactions in a database. Ideally no two threads can try to modify the same piece of data at the same time, dirty reads are not possible, and deadlocks are automatically detected and handled. Of course this also leads to the same issues such as questions about optimistic versus pessimistic locking, lock granularity, and the performance impact.
Databases also have some distinct advantages over normal software in the way they structure data. Every row is an atomic unit, to be locked, read, and updated as a whole. And the location of every row is in a fixed location, allowing for techniques such as lock escalation when too many fine-grained locks would otherwise be needed. With normal software each object must be referenced via pointer chains to roots, preventing any semantic groupings and raising the question, “What is the atomic unit in this context?”
Back in January Joe Duffy, Microsoft’s best known researcher on parallel and concurrent programming, cited four reasons why he becomes disillusioned with STM in his Brief Retrospective on Transactional Memory.
The first reason is the I/O problem. Most forms of I/O are inherently non-transactional, and as distributed computing becomes more popular, the problems become more unavoidable. Joe Duffy doesn’t just cite file access, but also event logs, web service calls, user interaction, platform invokes. He continues,
The question ultimately boils down to this: is the world going to be transactional, or is it not?
Whether unbounded transactions foist unto the world will succeed, I think, depends intrinsically on the answer to this question. It sure looked like the answer was going to be “Yes” back when transactional NTFS and registry was added to Vista. But the momentum appears to have slowed dramatically.
The second issue is the question of weak or strong atomicity. To grossly simplify the matter, it comes down to whether you strictly require transactional objects to only be read and written to in a transaction or you trust the developer to know what he is doing. The former will lead to a lot of problems around sharing data between transactional and non-transactional parts of an application, while the latter is inherently error prone.
The third reason is the question of privatization. As I mentioned earlier, normal software doesn’t have the clean boundaries between atomic units that databases do. Further complicating matters is that what’s considered an atomic unit can change over the lifespan of the data structure.
I still remember the day like it was yesterday. A regular weekly team meeting, to discuss our project’s status, future, hard problems, and the like. A summer intern on board from a university doing pioneering work in TM, sipping his coffee. Me, sipping my tea. Then that same intern’s casual statement pointing out an Earth-shattering flaw that would threaten the kind of TM we (and most of the industry at the time) were building. We had been staring at the problem for over a year without having seen it. It is these kinds of moments that frighten me and make me a believer in formal computer science.
The Tx0 transaction changes itIsOwned to true, and then commits. After it has committed, it proceeds to using whatever state was claimed (in this case an object referred to by variable x) outside of the purview of TM. Meanwhile, another transaction Tx1 has optimistically read itIsOwned as false, and has gone ahead to use x. An update in-place system will allow that transaction to freely change the state of x. Of course, it will roll back here, because isItOwned changed to true. But by then it is too late: the other thread using x outside of a transaction will see constantly changing state – torn reads even – and who knows what will happen from there. A known flaw in any weakly atomic, update in-place TM.
If this example appears contrived, it’s not. It shows up in many circumstances. The first one in which we noticed it was when one transaction removes a node from a linked list, while another transaction is traversing that same list. If the former thread believes it “owns” the removed element simply because it took it out of the list, someone’s going to be disappointed when its state continues to change.
Mr. Duffy’s final reason for being disillusioned with software transactional memory is the lack of any real successes. He writes,
Throughout all of this, we searched and searched for the killer TM app. It’s unfair to pin this on TM, because the industry as a whole still searches for a killer concurrency app. But as we uncovered more successes in the latter, I became less and less convinced that the killer concurrency apps we will see broadly deployed in the next 5 years needed TM. Most enjoyed natural isolation, like embarrassingly parallel image processing apps. If you had sharing, you were doing something wrong.
I eventually shifted focus to enforcing coarse-grained isolation through message-passing, and fine-grained isolation through type system support a la Haskell’s state monad.
I took this path not because I thought TM had no place in the concurrency ecosystem. But rather because I believed it did have a place, but that several steps would be needed before getting there.
For myself, I read that as realizing an approach such as seen in Clojure or Haskell is required. But, whether that's true or not, it is clear that what failed was not the idea of STM, but the approach they took to it.
Maybe with a truly relational language
Francisco Jose Peredo Noguez