A DRBD Dual-Primary setup done right
Cluster filesystems like GFS2 gained a lot popularity lately.
Higher Performance. Scalability. Fault tolerance. These features sound like heaven for any system architect. But they should be used with caution, because they add another layer of complexity to the cluster environment, which one should know to handle properly.
Phil Reisner, author of DRBD, points out three major questions, you should ask yourself prior to start hacking your console:
- Question 1: “Do I really, really have the need for a cluster filesystem?”
- Question 2: “Am I able to plan, verify and test my setup and the STONITH strategy extensively?”
- Question 3: Ask yourself question 1 again
Alternatively, but not necessarily, you may want to sleep over between question two and three.
Note: if you do not know what STONITH means, you definitely need to read into the topic of dual primary setups first. Download here:
Be sure that the testing time is well invested. A wrong configured dual primary cluster can not only misbehave, it can kill the high availability of your system, because the failover mechanism simply will not work in case of system failure and even more important, it can easily cause a corruption of data.
You assume you have 99.99% availability. Well, you may check your config again.
Let us show you how it’s done right!
Our new tech-guide is based on the thoughts of dual primary setups explained in the previous tech-guides and shows the complete process of implementing GFS2 with DRBD data replication.
We guide you through the configuration of
- DRBD preparation for dual primary
- CMAN & PACEMAKER as cluster resource managers
- and finally an example configuration for a fencing mechanism
You may ask what’s all about the fencing and STONITH stuff. Why do I need all this?
Well here comes the answer:
GFS2 permits you access to your data from two different servers at the same time. To be sure that the data is not altered by both nodes at the same time, GFS2 has implements a locking mechanism called glocks. These glocks are continuously synchronized within both cluster node to prevent data corruption.
In case of a network outage between the nodes this synchronization is interrupted and both nodes consider themselfs as alone. In this situation each node can modify it’s data independently as no locking synchronization is possible. You just have created a split brain situation, which means that you have diverging datasets. To resolve this problem you will then have to decide, which dataset is going to survive and drop the other one.
You can imagine that this decision won’t always be a painless one.
The only way to prevent this kind of situation, is to bring you cluster into a defined state before the split brain can even occur and this simply done by killing one node of the cluster, so that it can not generate it’s own dataset. This mechanism is called: STONITH (“Shoot The Other Node in The Head”)
Cluster enabled applications
The truth is that a dual primary DRBD cluster with a GFS2 on top is not fully transparent to the service. The application you are planning to implement has to support a clustered environment. In short the application has to know that it is not the only one. Unfortunately there are only a handful of applications which do support that, so the use cases are fairly limited.
Nevertheless there are some interesting scenarios where a dual primary DRBD setup makes a very sense.
- Live Migration: Some visualization hyper-visors require a cluster file system to support live migration. If you are using Citrix Xenserver we already did a lot of work for you. Look at DRBD binaries for Xenserver and our tech-guide Deploying DRBD with Citrix XenServer.
- Oracle Real Application Clusters (RAC): The usage of the Oracle RAC requires also a cluster file system to provide multiple instances of a database management system access to one single database. In this case oracle uses it’s own cluster file system named OCFS2 which is based on the same principles as GFS2
- Samba4: As mentioned before the application has to support cluster operation to work properly with a cluster file system like GFS2. Samba 4 brings a very interesting new feature which is called clustered Samba and provides load balancing and enhanced scalability by allowing to run multiple Samba instances on top of a cluster file system.
Now, the only thing missing for you is to download our GFS2 on a dual-primary DRBD tech-guide and get to work. Keep in mind that this tech-guide only focuses on this very topic and can not replace knowledge and other considerations on cluster configuration.
For any feedback regarding this tech-guide just drop us a line.