[check_postgres] "locks" check - checking age of lock, not max number of locks

Aleksey Tsalolikhin atsaloli.tech at gmail.com
Mon Jan 23 22:06:46 UTC 2012


First of all, check_postgres is incredibly useful, thank you!

Summary:

I would like to request a feature of monitoring lock age (not quantity),
please.  check_postgres would return WARNING if there is any lock
extant for longer than X seconds, and CRITICAL if there is any lock
extant for Y seconds.

This would be useful because I have a powerful database server
and I don't know at what point (how many) locks would be problematic;
but I do know that if I have a lock that is around for longer than X seconds,
it is problematic.  So I'd like to monitor for this abnormal state.

Slightly more verbose version follows.
--------------------------------------------------------------------------------.

Why check lock age instead of number of locks?

Situation:  I've upgraded our database server (it was I/O bound) and
it's doing great now; but Nagios is firing off too many alerts due to
lock count going over threshold.  I've tried bumping up the thresholds
but I still get alerts -- what happens is we go over threshold briefly
and then recover (go under threshold).

I've realized I don't care how many locks I have; I do care if the locks
I have are "old" locks, as it can lead to work stacking up and not
flowing through the system; and the system "jams".

I have a new beefy server and I don't know how many locks it can
handle in the course of normal operation (and our work volume is
growing, so I can't just take a baseline).  I can't say that having X
MANY locks is bad; but I can say with certainty that having any
lock that is over a minute old is abnormal and BAD.   I'd like to
be alerted of locks that persist for more than a minute, not
of lots of locks that come and go quickly (which is OK).

What do you think, would this be generally useful?

Yours very truly,
Aleksey


More information about the Check_postgres mailing list