Split Brain? Never Again? A New Solution for an Old Problem

While attending OpenStack Summit in Atlanta, I sat in a talk about the difficulties of implementing High Availability (HA) clusters. At one point, the speaker presented a picture of a split-brain, discussed the challenges in resolving them, and implementing STONITH in certain environments. As many of you know, “split-brain” is a condition that can happen when each node in a cluster thinks that it is the only active node. The system as a whole loses grip on its “state”; nodes can go rogue, and data sets can diverge without making it clear which one is primary. Data loss or data corruption
can result, but there are ways to make sure this doesn’t happen, so I was interested in probing further.

Fencing is not always the solution

Split brain

The Split brain problem can be solved by DRBD Quorum.

To make it more interesting, it turned out that the speaker’s company uses DRBD and Pacemaker for HA, a setup that is very familiar to us. After the talk, I approached the speaker and recommended that they consider “fencing” as a way to avoid split-brain. Fencing regulates access to a shared resource and can be a good safeguard. As the resource needs separate communication path best practices suggest not using the same one that it is trying to regulate, so it needs a separate communication path. Unfortunately, in his environment, redundant networking was not possible. We needed another method.

Split brain is solved via DRBD Quorum

After talking to the speaker, it was clear to me that a new option for avoiding split brain or diverging data sets was needed since existing solutions may not always be feasible in certain infrastructures. This got me thinking about the various options for avoiding split-brain and how fencing could be implemented by using the built-in communication found in DRBD 9. It turns out that the capability of mirroring more than two nodes, found in DRBD 9 is a viable solution.

That idea sparked the work on the newest feature in DRBD: Quorum.

Shortly thereafter, the LINBIT team developed and integrated a working solution into DRBD. The code was pushed to the LINBIT repository and ready for testing.

Interest was almost immediate!

Later on, I happened to meet a few folks from IBM UK. They were working on IBM MQ Advanced Software, the well-known messaging middleware software that helps integrate applications and data across multiple platforms. They intended to use DRBD for their replication needs and quickly became interested in the idea of using a Quorum mechanism to mitigate split-brain situations.

DRBD Quorum takes new perspective

The DRBD Quorum feature takes a new approach to avoiding data divergence.  A cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes within the defined cluster. By only allowing writes on a node that has access to over half the nodes in a given partition, we avoid creating a diverging data set.

The initial implementation of this feature would cause any node that lost Quorum (and was running the application/data set) to be rebooted.  Removing access to the data set is required to ensure the node stops modifying data. After extensive testing, the IBM team suggested a new idea that instead of rebooting the node, terminate the application. This action would then trigger the already available recovery process, forcing services to migrate to a node with Quorum!

Attractive alternative to fencing

As usual, the devil is in the details. Getting the implementation right with the appropriate resync decisions was not as straightforward as one might think. In addition to our own internal testing, many IBM engineers also tested it as well. We are happy to report that current implementation does exactly what was expected!

Bottom line:

If you need to mirror your data set three times, the new DRBD Quorum feature is an attractive alternative to hardware fencing.

In case you want to learn more about the Quorum implementation in DRBD
please see the DRBD9 user’s guide:
https://docs.linbit.com/docs/users-guide-9.0/#s-feature-quorum
https://docs.linbit.com/docs/users-guide-9.0/#s-configuring-quorum

LINBIT’s DRBD ships with integration to VCS

The LINBIT DRBD software has been updated with an integration for Veritas Infoscale Availability (VIA). VIA, formerly known as Veritas Cluster Server (VCS), is a proprietary cluster manager for building highly available clusters on Linux. Examples of application cluster capabilities are Network File Sharing databases or e-commerce websites. VCS solves the same problem as the Pacemaker Open Source projects.  

Yet, in contrast to Pacemaker, VCS has a long history on the Unix Platform. VCS came to Linux as Linux began to surpass legacy Unix platforms. In addition to its longevity, VCS has a strong and clean user experience. For example, VCS is ahead of the Pacemaker software when it comes to clarity of log files. Notably, the Veritas Cluster Server has slightly fewer features than Pacemaker. (With great power comes complexity!)

Gear-drbd-integration-VCS

The gear runs even smoother. DRBD has an integration for VCS.

VCS integration for DRBD

Since January 2018, DRBD has been shipping with an integration to VCS. Users are now able to use VCS instead of Pacemaker and even control DRBD via VCS. It consists of two agents: DRBDConfigure and DRBDPrimary that enable drbd-8.4 and drbd-9.0 for VCS.

Full documentation can be found here on our website:

https://docs.linbit.com/docs/users-guide-9.0/#s-feature-VCS

and

https://github.com/LINBIT/drbd-utils/tree/master/scripts/VCS

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Pacemaker 1.0.11 and up
Heartbeat 3.0.5 and up
Corosync 2.x and up

 

Reach out to [email protected] for more information.

We are driven by the passion of keeping the digital world running. That’s why hundreds of customers trust in our expertise, services and products. Our OpenSource product DRBD has been installed several million times. Linbit established DRBD® as the industry standard for High-Availability (HA) and data redundancy for mission critical systems. DRBD enables disaster recovery and HA for any application on Linux, including iSCSI, NFS, MySQL, Postgres, Oracle, Virtualization and more.

 

Why Does Higher Education Require Always-On Capabilities?

People understand the importance of hospital systems needing to be Highly Available. This is easy to explain since people’s LIVES depend on medical equipment and information being accessible at all times. Likewise, people understand the importance of banks needing High Availability (HA) — they expect access to their MONEY on-demand and want it protected. You don’t have to be a techie to quickly understand why hospitals and banks need to be constantly available. However, the need for HA at educational institutions is a bit more difficult to initially identify, because they are not often thought of as places where ‘mission-critical’ systems are a real requirement. I believe the story is told less, as it has an underwhelming shock factor– people’s lives are not at stake, nor is their money hanging in the balance. At LINBIT, we have many educational customers, including prestigious universities, and we wanted to get their perspective on why HA and why LINBIT. Read more

Dreaded Day of Downtime

Some say that no one dreads a day of downtime like a storage admin.

I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.

First, the company’s reputation takes a big, possibly irreparable hit with both clients and  employees. Damage control usually lasts far longer than the original outage.  Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.

Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away.  Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done.  Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.

Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates.  Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.

This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators.  The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.

During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests.  It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend.  However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.

High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.

If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):

What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.

The Top Issues and Topics for HA-DR in 2018

2017 is coming to a close and it is a good time to look back and then look forward. Thank you to our customers, partners, and the broader open source community for your participation, 2017 was a year of many accomplishments for LINBIT. We celebrated over 1.6 million downloads of DRBD, expanded into China, and released 4 new technical guides: HA NFS on RHEL 7, HA iSCSI on RHEL 7, HA & DR for ActiveMQ, and DRBD with Encryption. Read more

Deploy a DRBD/Pacemaker Cluster using Ansible

We get asked the question, “do you have a sandbox cluster we can play around in?”, by admins and potential clients looking to get a feel for managing a DRBD/Pacemaker cluster fairly often. Instead of spinning up some cloud instances and doling out access, we decided it would be better for our potential clients to be able to see how it all works in their actual environment. Ansible seemed like the best way to create a “one size fits all” solution for deploying such clusters into an unknown environment, and after a few days hacking together a playbook, it proved to be a good choice.

The end result was an Ansible playbook that can deploy a few different cluster configurations onto a pair of nodes in any environment. The playbook prompts the user for some inputs that will specify which type of cluster to deploy, which LINBIT contract to register the target nodes with, and which credentials to use for said registration; all of which could be set in your inventory file or passed via extra arguments on the command line to avoid prompting. After the playbook runs, you’re left with an initialized DRBD device and Pacemaker cluster at the very least, or a full blown HA cluster serving out either iSCSI or NFS (expect more later) that you can test with until your heart’s content.

You can find directions and my Ansible playbook’s repo on GitHub.

 

LINBIT Delivers High Availability and Disaster Recovery for Apache ActiveMQ Messaging Software

LINBIT Solution Simplifies HA and DR for ActiveMQ

BEAVERTON, Ore., Dec. 6, 2017 — LINBIT, a leader in open source High Availability (HA), Software Defined Storage (SDS), Disaster Recovery (DR) and the force behind the DRBD software, today announced that it is bringing disaster recovery capabilities to Apache ActiveMQ™, the most popular open source messaging and Enterprise Integration Pattern (EIP) server software.  

The LINBIT solution simplifies HA and DR for ActiveMQ because it does not require a clustered file system or shared database, a common requirement in current HA/DR implementations.

“Reliable communication in a distributed environment is a critical part of modern IT systems,” said Philipp Reisner, CEO of LINBIT. “LINBIT DR for ActiveMQ reduces cost and complexity for data centers and mitigates the risk often seen with SAN, clustered file systems, or shared databases.”

At TruckPro, the LINBIT DRBD software “is used primarily for resiliency,” stated Henry Santamaria, Director of Infrastructure. “Uptime is important for our business and anything we can do to quickly recover from any issue is paramount. Our investment in LINBIT yielded a noticeable increase in performance and stability which we did not have before.”

Known for its stability and performance over the last 15 years, LINBIT software is used by thousands of organizations across the globe, and is embedded in products from independent software vendors and established equipment manufacturers under OEM agreements. “With over 10,000 downloads per month, it is easy to see why even the most demanding environments rely on LINBIT to reduce risk and improve performance,” said Brian Hellman, LINBIT COO.

About LINBIT (http://www.linbit.com)

LINBIT is the force behind DRBD and the de facto open standard for High Availability (HA) software for enterprise and cloud computing. The LINBIT DRBD software is deployed in thousands of mission-critical environments worldwide to provide High Availability (HA), Geo Clustering for Disaster Recovery (DR), and Software Defined Storage (SDS) for OpenStack based clouds. Visit us at http://www.LINBIT.com, https://twitter.com/linbit, or https://www.linkedin.com/company/linbit. LINBIT is Keeping the Digital World Running.

Read it on PRWeb »

DRBD vs. CEPH

Guest blog by Jason Mayoral

DRBD

DRBD works by inserting a thin layer in between the file system (and the buffer cache) and the disk driver. The DRBD kernel module captures all requests from the file system and splits them down into two paths.

So, how does the actual communication occur? How do two separate servers optimize data protection?

DRBD facilitates communication by mirroring two separate servers – one server, although passive, is usually a direct copy of the other. Any data written to the primary server is simultaneously copied to the secondary one through a real time communication system. The passive server also immediately replicates any change made in the data.

DRBD 8.x works on two nodes at a time – one is given the role of the primary node, the other – a secondary role. Reads and writes can only occur on the primary node.

The secondary node must not mount the file system, not even in read-only mode. While it’s true to say that the secondary node sees all updates on the primary node, it can’t expose these updates to the file system, as DRBD is completely file system agnostic.

One write goes to the actual disk and another to a mirrored disk on a peer node. If the first one fails, the file system can be displayed on the opposing node and the data will be available for use.

DRBD has no precise knowledge of the file system and, as such, it has no way of communicating the changes upstream to the file system driver. The two-at-a-time rule does not actually limit DRBD from operating on more than two nodes.

Moreover, DRBD-9.x supports multiple peer nodes, meaning one peer might be a synchronous mirror in the local datacenter while another secondary might be an asynchronous mirror in a remote site.

Again, the passive server only becomes functional when the primary one fails. When such a failure occurs, Pacemaker immediately recognizes the mishap and shifts to the secondary server. This shifting process, nevertheless, is optional- it can either be manual or automatic. For users who prefer manual, one is required to authorize the system to shift to the passive server when the primary one fails.

CEPH

CEPH is open source software intended to provide highly scalable object, block, and file-based storage in a unified system.

CEPH storage clusters are designed to run on commodity hardware, using an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure data is evenly distributed across the cluster and that all cluster nodes can retrieve data quickly without any centralized bottlenecks.

Ceph object storage is available through Amazon Simple Storage Service (S3) and OpenStack Swift Representational State Transfer (REST) – based application programming interfaces (APIs), and a native API for integration with software applications.

Ceph block storage uses a Ceph Block Device, which is a virtual disk that can be attached to bare-metal Linux-based servers or virtual machines. The Ceph Reliable Autonomic Distributed Object Store (RADOS) provides block storage capabilities, such as snapshots and replication. The Ceph RADOS Block Device is integrated to work as a back end with OpenStack Block Storage.

Object storage

Ceph implements distributed object storage. Ceph’s software libraries provide client applications with direct access to the reliable autonomic distributed object store (RADOS) object-based storage system, and also provide a foundation for some of Ceph’s features, including RADOS Block Device (RBD), RADOS Gateway, and the Ceph File System.

The librados software libraries provide access in C, C++, Java, PHP, and Python. The RADOS Gateway also exposes the object store as a RESTful interface which can present as both native Amazon S3 and OpenStack Swift APIs.

Block storage

Ceph’s object storage system allows users to mount Ceph as a thin-provisioned block device. When an application writes data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph’s RADOS Block Device (RBD) also integrates with Kernel-based Virtual Machines (KVMs).

Ceph RBD interfaces with the same Ceph object storage system that provides the librados interface and the CephFS file system, and it stores block device images as objects. Since RBD is built on librados, RBD inherits librados’s abilities, including read-only snapshots and revert to snapshot. By striping images across the cluster, Ceph improves read access performance for large block device images.

The block device can be virtualized, providing block storage to virtual machines, in virtualization platforms such as Apache CloudStack, OpenStack, OpenNebula, Ganeti, and Proxmox Virtual Environment.

Guest blog by Jason Mayoral ( www.rebelcorp.us )

Don’t Settle for Downtime

Innovative Data Storage Can Save Cash, Headaches, and Your Data

Storage Downtime is Unacceptable

When the network goes down, everyone is mildly annoyed, but when the storage goes down,  “Everyone loses their mind, ” as the Joker would say.  And for good reason. No one likes losing payroll data, shipments, customer information, financial transactions, or CRM information… And they certainly don’t like waiting while you roll back to your latest backup. Internally and externally, data-loss and downtime wastes valuable resources and it hurts company reputation. Downtime is becoming less acceptable every day, and data-loss, even more so. Stable, safe, and secure storage should be a priority for those responsible for protecting their business (just ask Equifax).

Traditional Solutions

Due to the increasing need for high availability (HA) and disaster recovery (DR), proprietary storage companies like NetApp and Dell EMC have provided SAN and NAS technologies to protect your organization’s most important data. These hardware appliances, many times, have no single point of failure, synchronous data replication and even a nice GUI so that users can point-and-click their way around. The downside? These storage appliances aren’t scalable and they are expensive. Really expensive.

The Obvious (or not so obvious) Alternative

Did you know that resiliency is built into your Linux OS? That’s right, built into the mainline linux kernel is everything you need to replace your shared storage. For over 15 years, LINBIT has been creating the DRBD software, designed to synchronously replicate data between Linux servers seamlessly just like your SAN. It can even trick the application above to believing they are writing to a SAN, when in reality, it is standard X86, ARM, or Power boxes. The full LINBIT HA solution combines the DRBD software with open source fail-over software as well. This combination eliminates the need for proprietary shared storage solutions. So, why aren’t you using it? You probably didn’t know that it existed.

 

For the past 20 years, those with IT know-how, and small budgets found that HA clustering, using commodity off-the-shelf hardware, was an affordable alternative to traditional storage methods. This crowd consisted of the standard Linux hacker rolling out a home-brewed web-server, and the hyperscale players who didn’t want to rely on outside vendors to build their cloud. Being that these hyperscale companies are using the software to create a competative advantage against their competitors they aren’t all-that-eager to share their stories. They have kept the mid-market in the dark.

Almost all of the major players (including Google, Cisco, Deka Bank, HP, Porsche, and the BBC) have realized that using standard hardware instead of proprietary appliances creates a competitive advantage. Namely: inexpensive resilient storage that their competitors are paying an arm and a leg for. Now, the storage industry’s best kept secret is finally out.

It Doesn’t Stop There

LINBIT is pioneering open source SDS. In development for over 7 years, the new solution will create standard High Availability clusters like described above, and also work perfectly for cloud storage. The LINBIT SDS software introduces performance advantages scalability to the  design. LINBIT’s created a sort of “Operating System based,” Open Source, Software Defined Storage technology that is already built into your existing operating system and ready to use with any Linux system.

The Default Replication Option

LINBIT’s DRBD software receives about 10,000 confirmed downloads per month (people who opt-in to show their statistics). LINBIT is far more engineering and development focused than sales focused so if you aren’t solving a real-world problem you have probably never ran into them. LINBIT’s software popularity is user driven, and due to 3 main reasons:

Flexibility: Since the DRBD software replicates data at the block level, it works with any filesystem, VM, or application that writes data to a hard drive. It can replicate multiple resources simultaneously so users don’t have to choose different replication technologies for every application/database running on the server.

Stability: Being accepted into the mainline Linux kernel is a very stringent process. DRBD has been in the kernel since 2009, version 2.6.33

Synchronous: Prior to DRBD’s availability (no pun intended), the only option for synchronous replication was hardware (SAN, NAS devices). The DRBD software can run in synchronous or asynchronous mode, and be used for local replication or Geo Clustering across long distances.

Now that DRBD has tools to provision your storage, scaling out has never been easier. Interested in how this might apply for your projects? Check out some of LINBIT’s  (free) innovative technical documents which describe how to set up a cluster for your specific environment. Have an idea that isn’t covered in the documentation? Reach out to [email protected] and ask if your idea is sane. They’ll consult the LINBIT engineering team, and will point you in the right direction. Most importantly, NEVER settle for unplanned downtime.

Find out more about the costs of downtime in the podcast, The OrionX Download with LINBIT CEO, Brian Hellman.

DRBD and Randtronics DPM

Today we’re happy to announce a new document titled “Block Replication with Filesystem Encryption” which showcases another wonderful use case for DRBD.

Block Replication with Filesystem Encryption

At Hosting Con, back in April of this year, some colleagues of mine ran into some representatives from Randtronics. Randtronics is the company responsible for the DPM (Data Privacy Management) software suite. This software suite provides file encryption, user management, ACLs, and more. I could imagine this software would prove useful to those in fields where data privacy is an absolute must. Fields such as the medical, legal, human resources, or intellectual property, quickly come to mind.

(Graphic is property of Randtronics)

After a brief discussion with us regarding just how versatile DRBD can be it was decided to see if perhaps DRBD could work seamlessly with DPM. Randtronic’s DPM can help protect your data from prying eyes, or those who may wish to steal it, but can it protect your data from system failures? When teamed up with DRBD you can be assured that your data is both secure and available.

I worked briefly with Gary Lansdown of Randtronics to introduce him to asciidoc, but I must give credit to Randtronics for this document.