Oracle DynDNS with Booth Geo-Clustering

Pacemaker was never designed to operate across the WAN, or any high latency networks. However, there has always been a need and desire to orchestrate active/passive failovers between data centers and across long distances. To address this issue the Booth Pacemaker add-on was conceived back in late 2011. LINBIT has been involved in the development of Booth since 2013, and has been offering it as a supported solution since 2015.

Booth addresses the shortcomings of Pacemaker by introducing the concept of “tickets”. We constrain particular resources to tickets, and only the site which holds the ticket may start the particular resources. This can be thought of like the old token ring networks of days past. In order for Booth to ensure there is no cluster split, and two sites never possess the ticket at the same time, we utilize arbitration nodes to achieve quorum, and set an expiration period upon the tickets. If a site loses communication with the rest of the Booth cluster its ticket will not renew and it will stop resources within the expected time frame.

While Pacemaker with Booth addresses the issues of High Availability across the WAN, one issue which has always proven difficult is redirecting client traffic to the new site. In most of our demonstrations of Booth we have simply used a round-robin DNS (such as in my demonstration here: Booth Geo Cluster Demo). While round-robin DNS is easy to configure and simple, it is quite inefficient as every other request is discarded.

LINBIT has recently been working with Oracle DynDNS in order to find a more efficient and better solution. Fortunately, Oracle DynDNS offers a Managed DNS service toting a feature aptly named, “Active Failover”. The Active Failover feature can be configured to monitor several things for health. The managed Oracle DynDNS servers can monitor an IP address via ping, SMTP, HTTP(S) or a particular listening TCP port, and then update the DNS destinations only when the service fails and Pacemaker switches the sites. This makes it much more efficient and a perfect match for Pacemaker clusters utilizing Booth.

To demonstrate this solution in detail we have developed a tech-guide which outlines, step-by-step, how to configure this using RHEL 7, Pacemaker, Booth, and Oracle DynDNS Managed DNS, to provide a Highly Available, Geo-Clustered, MariaDB service. This document can be found in the documentation section of our website at the link below.

DRBD 9 Now Supports Fencing In Pacemaker

Fencing is the process of isolating a node from a computer cluster or protecting shared resources when a node appears to be malfunctioning. As the number of nodes in a cluster increases, so does the likelihood of failure.1

Read more

Change the cluster distribution without downtime

Recently we’ve upgraded one of our virtualization clusters (more RAM), and in the course of this did an upgrade of the virtualization hosts from Ubuntu Lucid to RHEL 6.3 — without any service interruption. Read more

Editing the Pacemaker configuration with VIM

For people using the VIM editor I’ve got two small tips when editing Pacemaker configurations:

Use syntax highlight. This helps to see unmatched quote characters easily. Whether it’s too colorful can be discussed, though 😉
A current version can be found here, and the mailing list post is here.

For correlating resource names I recommend the Mark plugin. Read more

DRBD resources need different monitor intervals

As briefly mentioned in Pacemaker Explained, DRBD devices need two different values set for their monitor intervals:

primitive pacemaker-resource-name ocf:linbit:drbd         \
        params drbd_resource="drbd-resource"              \
        op monitor interval="61s" role="Slave"            \
        op monitor interval="59s" role="Master"

The reason is that Pacemaker distinguishes monitor operations by their resource and their interval – but not by their role. So, if this distinction is not done “manually”, Pacemaker will monitor only one of the two (and, with DRBD 9, more) nodes, which is not what you want (usually).

LRMd hangs on Ubuntu (Lucid and Maverick)

We’ve recently come across a case where stopping pacemaker (in this case via /etc/init.d/heartbeat stop) didn’t work; and, similarly, crm configure property maintenance-mode=false wouldn’t work.

After some searching and testing the solution was found: Upgrading libglib2.0-0 to at least the natty version 2.28.6-0ubuntu1 fixed the problem.

The bug seems to have been a locking problem.