Stumbling upon the Holy time-travellin’ DRBD, batman! blog post there’s only one thing to be said …
Be strict in what you emit, liberal in what you accept[1. Thanks, Larry]
is simply not true when dealing with mission-critical systems.
It’s ok to be alerted on upgrading a machine because the “old, working” RegEx that did the parsing doesn’t match anymore[1. eg. because
/proc/drbd got an additional field]; it’s not a problem to get an email when someone adds the 100th DRBD resource and causes the grep to fail; and so on.
Better to have a few false positives when you’re actively changing things than to get a false negative that costs you months of data; that’s what an
assert (and monitoring isn’t that different) is for, after all.
Keep monitoring strict, and let it fail loudly on unexpected things – after the first few occurrences they’re not unexpected anymore and can be dealt with.