Fencing is the process of isolating a node from a computer cluster or protecting shared resources when a node appears to be malfunctioning. As the number of nodes in a cluster increases, so does the likelihood of failure.1
Fencing avoids data divergence in clusters when the cluster is partitioned. Fencing is difficult to implement, mostly due to the fact that independent network links for the fencing devices are unavailable. Additional roadblocks such as IPMI passwords being reset to defaults after firmware upgrades and security policies asking for IPMI to be disabled add to the complexity. Unfortunately, the aforementioned issues tend to result in fencing/STONITH being disabled, leaving clusters vulnerable to split-brain.
DRBD 9 introduced the ability to replicate a resource 1 → N times. Typically N will be 3-4 nodes. Replication worked from inception. However, integration into Pacemaker wasn’t finished. For the past few weeks this has been a main focus at LINBIT. Thanks to Lars Ellenberg and Phil Reisner, that integration is complete.
In addition to fencing, 9.0.7 now ships with the ability to make quorum based decisions. It serves the same purpose as fencing, avoiding data divergence. If the cluster is partitioned, only a partition with the majority of nodes is allowed to have a primary node, or allow a node to get promoted to primary.
By using one of the aforementioned methods, one can ensure that their data will be safe in the event of hardware failure. Fencing, or sometimes commonly referred to as STONITH, is a vital part of any cluster and is always recommended.
Download the latest release HERE!