InfoQ

News

Software Transactions: A Programming Language Perspective

Posted by Jean-Jacques Dubray on Mar 20, 2008 01:00 AM

Community
Architecture
Topics
Transactions Processing
Tags
Concurrency,
ACID
Erlang has recently generated a lot of interest as a language that can deal both efficiently and elegantly with concurrency. In particular Erlang is thought to be a natural programming language for multicore processors. One of the key design concepts of Erlang is that there is no shared memory between "process" instances. They only communicate via asynchronous messages. Nevertheless, Shared Memory Concurrency is an intense research subject.

Last week, UWTV published a webcast by Dan Grossman that gave an overview of the state-of-the-art for Transactional Memory: "Software Transactions: A Programming Language Perspective". Dan  also gave this talk at Google and UC Berkeley.

Dan Grossman is an assistant professor at the University of Washington in the Department of Computer Science and Engineering. His research includes programming language design, implementation and theory.

The focus of his talk is to explore the impact of transactions on the semantics and implementation of a programming language, and study how scalable mutithread transactions can be:

With multicore processors bringing parallel computing to the masses, there is an urgent need to make concurrent programming easier. Software transactions hold great promise for simplifying shared-memory concurrency, and they have received enormous attention from the research community in the last couple years.

He draws an interesting analogy between Transactional Memory and Garbage Collection:

Transactional Memory (TM) is to shared-memory concurrency
                                          as
Garbage Collection (GC) is to memory management

He explains that concurrent programming is hard just like memory management because you need to balance correctness and performance. He also argues that both are non-modular as the caller and callee must know about each other and often a small change can require wide-scale code change.

He suggests that the solution, just like for GC, is to move hand coded protocols to the language implementation, even though he acknowledges that Transactional Memory is an incomplete solution:

when memory conflict is a bad approximation of "cannot-be-run-in-parallel"

Dan introduces source-level formal semantics for 3 different isolation levels:

  • “Strong”: If one thread is in a transaction, no other thread may use shared memory or enter a transaction
  • “Weak-1-lock”: If one thread is in a transaction, no other thread may enter a transaction
  • “Weak-undo”: Like weak, plus a transaction may abort at any point, undoing its changes and restarting

He suggests that there is a widespread misconception:

“Weak” isolation violates the “all-at-once” property only if corresponding lock code has a race

He then continues on to establish several key theorems, including this one:

weak-undo allows behavior weak-1-lock does not

The second part of his talk is dedicated to apply these semantics to multicore systems. He starts by pointing out that uniprocessor is an important special case. He notes:

  • Threads communicating via shared memory don't execute in “true parallel”
  • Many language implementations assume it         
  • Multicore may assign one core to an application

Dan argues that in the case of the uniprocessor, Strong isolation of Transactional Memory can be achieved at little cost. There is little memory access overhead and rollbacks are rare. In addition only the writes are logged. He also provides some example of static optimizations that can be applied at the bytecode level to enhance the performance in the non-transactional code.

Dan's team is starting to work on the multithreaded transaction case. Dan questions an important assumption that people make:

Most implementations (hw or sw) assume code inside a transaction is single-threaded, but isolation and parallelism are orthogonal (and Amdahl’s Law will strike with manycore)

He points that there are good reasons to make this assumption because it makes your implementation simpler. As the number of core increases, the performance of the system will degrade because of this assumption. If you don't make this assumption, you need to introduce nested transactions at the language level. He also points that parallelizing the logging is an important problem to solve to achieve scalability.  Dan is wondering when Amdahl's law really requires these enhancements. He is suggesting that it could be when the core counts are in the hundreds.

1 comment

Reply

Broken link by Mikhail Franco Posted Mar 20, 2008 2:34 AM
  1. Back to top

    Broken link

    Mar 20, 2008 2:34 AM by Mikhail Franco

    Link to ppt is broken, should be just http://www.cs.washington.edu/homes/djg/slides/grossman_uwcse08.ppt Mik

Exclusive Content

Rationalizing the Presentation Tier

Thin client paradigm characterized by web applications is a kludge that needs to be repudiated. Old compromises are no longer needed and it's time to move the presentation tier to where it belongs.

Agile Project Management: Lessons Learned at Google

In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.

AtomServer – The Power of Publishing for Data Distribution

In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.

An Introduction to Virtualization

It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.

REST Anti-Patterns

In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.

Choosing between Routing and Orchestration in an ESB

In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.

Enterprise Batch Processing with Spring

Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.

User Story Estimation Techniques

Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.