DRBD 9 Now Supports Fencing In Pacemaker

Fencing is the process of isolating a node from a computer cluster or protecting shared resources when a node appears to be malfunctioning. As the number of nodes in a cluster increases, so does the likelihood of failure.1

Read more

Change the cluster distribution without downtime

Recently we’ve upgraded one of our virtualization clusters (more RAM), and in the course of this did an upgrade of the virtualization hosts from Ubuntu Lucid to RHEL 6.3 — without any service interruption. Read more

Editing the Pacemaker configuration with VIM

For people using the VIM editor I’ve got two small tips when editing Pacemaker configurations:

Use syntax highlight. This helps to see unmatched quote characters easily. Whether it’s too colorful can be discussed, though 😉
A current version can be found here, and the mailing list post is here.

For correlating resource names I recommend the Mark plugin. Read more

DRBD resources need different monitor intervals

As briefly mentioned in Pacemaker Explained, DRBD devices need two different values set for their monitor intervals:

primitive pacemaker-resource-name ocf:linbit:drbd         \
        params drbd_resource="drbd-resource"              \
        op monitor interval="61s" role="Slave"            \
        op monitor interval="59s" role="Master"

The reason is that Pacemaker distinguishes monitor operations by their resource and their interval – but not by their role. So, if this distinction is not done “manually”, Pacemaker will monitor only one of the two (and, with DRBD 9, more) nodes, which is not what you want (usually).

LRMd hangs on Ubuntu (Lucid and Maverick)

We’ve recently come across a case where stopping pacemaker (in this case via /etc/init.d/heartbeat stop) didn’t work; and, similarly, crm configure property maintenance-mode=false wouldn’t work.

After some searching and testing the solution was found: Upgrading libglib2.0-0 to at least the natty version 2.28.6-0ubuntu1 fixed the problem.

The bug seems to have been a locking problem.