What is a distributed transaction?
Microservices architecture has been a very popular architectural pattern in recent times. However, one common problem is how to manage distributed transactions across multiple microservices.
When a microservice architecture decomposes a monolithic system into self-encapsulated services, it can break transactions. This means a local transaction in the monolithic system is now distributed into multiple services that will be called in a sequence.
Let's try to understand this concept with a hypothetical train ticket booking system. Consider below ticket booking monolith application.
In the train ticket booking example above, if an actor sends a book ticket action to a monolithic system, the system will create a local database transaction that works over multiple database tables (account table, booking table). If any step fails, the transaction can roll back and data consistency is guaranteed by the database's ACID (Atomicity, Consistency, Isolation, Durability) property
To achieve high horizontal scalability we can break this monolith into multiple microservices. When we break down this application in microservices we face 2 major challenges.
How to maintain a transaction’s automaticity?
How to manage the transaction isolation level for concurrent requests?
We can address these challenges by a two-phase commit (2PC) pattern or SAGA pattern. Let's try to understand each in a bit more detail
Two-phase commit (2PC):
Two-phase commit is a well-known pattern in RDBMS. This pattern can also be used for microservices architecture to implement distributed transactions. In a two-phase commit, there is an orchestrating service that contains most of the logic and participating service on which the actions are performed. It works in two phases:
Even though 2PC can help provide transaction management in a distributed system, it is considered to be impractical for microservices architecture. The reasons are below:
Coordinators can become a single point of failure and bottleneck from a performance perspective.
If one microservice becomes unavailable in the commit phase, there is no mechanism to roll back the other transaction.
Other services must wait until the first service finishes its confirmation.
The resources used by the services are locked until the whole transaction is complete.
Two-phase commits are slow by design due to their dependence on the transaction coordinator.
Saga pattern
The Saga pattern is another widely used pattern for distributed transactions. It is different from 2pc, which is synchronous. The Saga pattern is asynchronous and reactive. In a Saga pattern, the distributed transaction is fulfilled by asynchronous local transactions on all related microservices. The microservices communicate with each other through an event bus. The Saga guarantees that either all operations are complete successfully or the corresponding compensation actions are run for all executed operations to roll back any work previously done.
In the above diagram, each service sends event about success or failure and respective subscriber of the event will take action, for example of receive payment fails then receive payment service will send failure notification to event hub and unblock seat service will revert the seat blocking.
There are various ways to implement SAGA patterns like choreography and orchestration.
Advantages of Saga:
Maintain data consistency across multiple microservices without tight coupling.
Perform better compared to 2PC.
Offer no single point of failure.
Keep the overall state of the transaction eventually consistent.
Disadvantages of Saga:
The Saga pattern is difficult to debug, especially when many microservices are involved.
The event messages could become difficult to maintain if the system gets complex.
Another disadvantage of the Saga pattern is it does not have read isolation. For example, the customer could see the order being created, but in the next second, the order is removed due to a compensation transaction.
In nutshell, in the microservices world where each microservice has its own data store, it is complex and difficult to keep the consistency of data. We need to think of all possible scenarios while designing the system and make design more extensible and flexible.