While attending OpenStack Summit in Atlanta, I sat in a talk about the difficulties of implementing High Availability (HA) clusters. At one point, the speaker presented a picture of a split-brain, discussed the challenges in resolving them, and implementing STONITH in certain environments. As many of you know, “split-brain” is a condition that can happen when each node in a cluster thinks that it is the only active node. The system as a whole loses grip on its “state”; nodes can go rogue, and data sets can diverge without making it clear which one is primary. Data loss or data corruption
can result, but there are ways to make sure this doesn’t happen, so I was interested in probing further.
Fencing is not always the solution
To make it more interesting, it turned out that the speaker’s company uses DRBD and Pacemaker for HA, a setup that is very familiar to us. After the talk, I approached the speaker and recommended that they consider “fencing” as a way to avoid split-brain. Fencing regulates access to a shared resource and can be a good safeguard. As the resource needs separate communication path best practices suggest not using the same one that it is trying to regulate, so it needs a separate communication path. Unfortunately, in his environment, redundant networking was not possible. We needed another method.
Split brain is solved via DRBD Quorum
After talking to the speaker, it was clear to me that a new option for avoiding split brain or diverging data sets was needed since existing solutions may not always be feasible in certain infrastructures. This got me thinking about the various options for avoiding split-brain and how fencing could be implemented by using the built-in communication found in DRBD 9. It turns out that the capability of mirroring more than two nodes, found in DRBD 9 is a viable solution.
That idea sparked the work on the newest feature in DRBD: Quorum.
Shortly thereafter, the LINBIT team developed and integrated a working solution into DRBD. The code was pushed to the LINBIT repository and ready for testing.
Interest was almost immediate!
Later on, I happened to meet a few folks from IBM UK. They were working on IBM MQ Advanced Software, the well-known messaging middleware software that helps integrate applications and data across multiple platforms. They intended to use DRBD for their replication needs and quickly became interested in the idea of using a Quorum mechanism to mitigate split-brain situations.
DRBD Quorum takes new perspective
The DRBD Quorum feature takes a new approach to avoiding data divergence. A cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes within the defined cluster. By only allowing writes on a node that has access to over half the nodes in a given partition, we avoid creating a diverging data set.
The initial implementation of this feature would cause any node that lost Quorum (and was running the application/data set) to be rebooted. Removing access to the data set is required to ensure the node stops modifying data. After extensive testing, the IBM team suggested a new idea that instead of rebooting the node, terminate the application. This action would then trigger the already available recovery process, forcing services to migrate to a node with Quorum!
Attractive alternative to fencing
As usual, the devil is in the details. Getting the implementation right with the appropriate resync decisions was not as straightforward as one might think. In addition to our own internal testing, many IBM engineers also tested it as well. We are happy to report that current implementation does exactly what was expected!
If you need to mirror your data set three times, the new DRBD Quorum feature is an attractive alternative to hardware fencing.
In case you want to learn more about the Quorum implementation in DRBD
please see the DRBD9 user’s guide: