Monitoring: better safe than sorry…

Stumbling upon the Holy time-travellin’ DRBD, batman! blog post there’s only one thing to be said …

Be strict in what you emit, liberal in what you accept[1. Thanks, Larry]

is simply not true when dealing with mission-critical systems.

It’s ok to be alerted on upgrading a machine because the “old, working” RegEx that did the parsing doesn’t match anymore[1. eg. because /proc/drbd got an additional field]; it’s not a problem to get an email when someone adds the 100th DRBD resource and causes the grep to fail; and so on.

Better to have a few false positives when you’re actively changing things than to get a false negative that costs you months of data; that’s what an assert (and monitoring isn’t that different) is for, after all.

Keep monitoring strict, and let it fail loudly on unexpected things – after the first few occurrences they’re not unexpected anymore and can be dealt with.

Like? Share it with the world.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp
Share on vk
VK
Share on reddit
Reddit
Share on email
Email