Monitoring: better safe than sorry…

Stumbling upon the Holy time-travellin’ DRBD, batman! blog post there’s only one thing to be said …

Be strict in what you emit, liberal in what you accept[1. Thanks, Larry]

is simply not true when dealing with mission-critical systems.

It’s ok to be alerted on upgrading a machine because the “old, working” RegEx that did the parsing doesn’t match anymore[1. eg. because /proc/drbd got an additional field]; it’s not a problem to get an email when someone adds the 100th DRBD resource and causes the grep to fail; and so on. Read more

LRMd hangs on Ubuntu (Lucid and Maverick)

We’ve recently come across a case where stopping pacemaker (in this case via /etc/init.d/heartbeat stop) didn’t work; and, similarly, crm configure property maintenance-mode=false wouldn’t work.

After some searching and testing the solution was found: Upgrading libglib2.0-0 to at least the natty version 2.28.6-0ubuntu1 fixed the problem.

The bug seems to have been a locking problem.