At the IEEE International Conference on Data Engineering (ICDE) 2019, eBay engineers presented a paper introducing a protocol for distributed ACID transactions using multiple databases, GRIT. Support for multiple databases is key to enabling GRIT's use across microservices, which are usually implemented in different languages and may use multiple underlying databases.
GRIT aims to fill a gap in existing technologies for ACID transactions, which cannot be easily extended to the multi-database case, write eBay engineers.
In environments that involve multiple independent databases, the traditional two-phase commit (2PC) protocol was essentially the only option for distributed transactions by the system without additional application effort. However, it does not work well in a scale-out platform due to long paths of potentially many coordinating participants and the locking required over the phases. On the other hand, using a transaction log executed by a framework such as Saga will incur complex compensating logic by applications and may have business implications due to irreversible partially successful transactions.
GRIT architecture is displayed in the following picture in the context of a two-database microservice application.
As the image above shows, GRIT includes two main blocks: one group of components specific to each database —a Database Transaction Manager (DBTM), a Database Transaction Log (DBTL), and a LogPlayer for each databas — and two global components in charge of coordination —the Global Transaction Manager (GTM), and the Global Transaction Log (GTL).
A GRIT transaction occurs in three phases. During the execution of a transaction, database services collect the read-set and write-set of the transaction, without actually modifying any data. When the transaction is committed, each database submits its read-set and write-set to its DBTM, which makes a local commit decision analyzing them. All involved DBTM submit their local decisions to the GTM, which makes the global commit decision. Finally, if the transaction succeeds, the LogPlayers will send the entries collected in the DBTL to the databases, which store the data.
Overall, our approach avoids pessimistic locking during both execution and commit process and avoids waiting for physical commit. We take the optimistic approach and also make the commit process very efficient by leveraging logical commit logs and moving the physical database changes out of the commit decision process with deterministic database technology, which is similar to log play in replication.
It is worth noting GRIT can also be applied to single-database applications. In those cases, the global components are not required and the commit transaction is of minor complexity. eBay engineers show a GRIT setup meant to extend an existing database to support ACID transactions across multiple availability zones. Do no miss the original post if you are interested in the full details.
Community comments
Some thoughts
by Guy Pardon,
what is the exact difference with XA-2PC?
by Shen Haiqiang,
GRIT Protocol DOES NOT Enable Distributed Transactions across Multi-Database Microservices
by Michael Poulin,
Some thoughts
by Guy Pardon,
Your message is awaiting moderation. Thank you for participating in the discussion.
Looks good, but I do have some thoughts.
First of all, the traditional 2-phase commit can scale, as we (Atomikos) have shown. So it's not per se needed to implement additional logging techniques on top of that.
Second, optimistic locking is great but deteriorates for hot spot data.
But it is true that microservice transactions can be a challenge. We've compiled a free ebook with our decades of experience in the matter (including distributed SOA, pre-microservices). Grab your free copy here:
www.atomikos.com/Blog/MicroserviceTransactionPa...
what is the exact difference with XA-2PC?
by Shen Haiqiang,
Your message is awaiting moderation. Thank you for participating in the discussion.
I don't think this protocol is different from XA-2PC protocol in fact.
GTM is TM in XA, DBTM is RM in XA. GTM should ask all DBTMs for local commit decision and then make the global commit decision. The only difference is that there is a logical layer in GRIT, it separate physical data change to an async process.
GRIT Protocol DOES NOT Enable Distributed Transactions across Multi-Database Microservices
by Michael Poulin,
Your message is awaiting moderation. Thank you for participating in the discussion.
I have many questions to this Protocole and I am sure it does not work in distributed (world-wide) environment. I doubt that it was tested at all. Here are my questions:
1. << Support for multiple databases is key to enabling GRIT's use across microservices, which are usually implemented in different languages and may use multiple underlying databases>>:
-What types of databases are meant here - relational and/or noSQL as well?
-Where Microservices are used in the article and why Microservices?
-Does it also mean that one Microservice can use multiple databases? (this is an anti-scaling pattern - a database is an anchor).
2. << [2PC] does not work well in a scale-out platform due to long paths of potentially many coordinating participants and the locking required over the phases>> - is it meant locking of the same shared table in the shared database?
3. <<Saga will incur complex compensating logic by applications and may have business implications due to irreversible partially successful transactions>> - what are these "irreversible partially successful transactions" - why they are "partially successful"? Which successful transaction is reversible?
4. What does mean "DB Service" and "Entity Service"? In SOA, there are Data Access Services, which are separate from the Business Services and databases. If "Entity Service" is just a set of data and CRUD operation, it is neither a SOA Service nor a Microservices (since there is no business functionality in it). It is rather a database driver on HTTP steroids.
5) Do you agree with their assumption that their protocol may access all databases under different ownership?
5) Is this right understanding that Database Transaction Manager (DBTM), a Database Transaction Log (DBTL) have nothing to do with the original DB;s transactions and logs?
6) I doubt that the efficiency of GRIT is valuable since they have <<two global components in charge of coordination>>, i.e. they deprecate distribution and join London- and Melbourne-located databases. Would you agree? Such coordination is highly risky due to the number of independently managed networks.
7) << During the execution of a transaction, database services collect the read-set and write-set of the transaction>> what are these sets particularly (in the sense of optimistic locking)? Due to network latency (long time), a probability of optimistic locking compromise is very high. Correct?
8) <<When the transaction is committed, each database submits its read-set and write-set to its DBTM>> - in other words, each Database Service (?) records its decision with DBTM, but this relates to the transactions related to given transaction. At the same time, the same databases may be read and written many times via different local transactions (or even by concurrent transactions). Does it make sense? What value of aforementioned registration (it is likely it is not valid already)?
9) <<All involved DBTM submit their local decisions to the GTM>>, i.e. adds more latency. Correct?
10) <<Finally, if the transaction succeeds, the LogPlayers will send the entries collected in the DBTM to the databases, which store the data>> - well, due to latency, again, the time between coordinated decision and distribution of entries may be enough to compromise optimistic locking...
Altogether, this reminds me 1) a CORBA-based Type 4 database driver (about 2001) that was more complex then the database and the application together, and 2) an attempt to push an old sock in a new shoe. Peter Drucker once said: "There is nothing quite so useless, as doing with great efficiency, something that should not be done at all." Would you agree with my opinion about this "solution"?
Thank you in advance,
- Michael
P.S. The Code of DevOps Practice states: "Not everything that works well on a small stand-alone task works equally well on a bigger task".