[check_postgres] "locks" check - checking age of lock, not max number of locks

Aleksey Tsalolikhin atsaloli.tech at gmail.com
Mon Jan 23 22:12:22 UTC 2012


Oops.   I just realized I may be able to accomplish effective
monitoring of lock health by monitoring number of connections waiting
for a lock.  I assume that should usually be low and will spike when
things start to jam.

Aleksey

On Mon, Jan 23, 2012 at 2:06 PM, Aleksey Tsalolikhin
<atsaloli.tech at gmail.com> wrote:
> First of all, check_postgres is incredibly useful, thank you!
>
> Summary:
>
> I would like to request a feature of monitoring lock age (not quantity),
> please.  check_postgres would return WARNING if there is any lock
> extant for longer than X seconds, and CRITICAL if there is any lock
> extant for Y seconds.
>
> This would be useful because I have a powerful database server
> and I don't know at what point (how many) locks would be problematic;
> but I do know that if I have a lock that is around for longer than X seconds,
> it is problematic.  So I'd like to monitor for this abnormal state.
>
> Slightly more verbose version follows.
> --------------------------------------------------------------------------------.
>
> Why check lock age instead of number of locks?
>
> Situation:  I've upgraded our database server (it was I/O bound) and
> it's doing great now; but Nagios is firing off too many alerts due to
> lock count going over threshold.  I've tried bumping up the thresholds
> but I still get alerts -- what happens is we go over threshold briefly
> and then recover (go under threshold).
>
> I've realized I don't care how many locks I have; I do care if the locks
> I have are "old" locks, as it can lead to work stacking up and not
> flowing through the system; and the system "jams".
>
> I have a new beefy server and I don't know how many locks it can
> handle in the course of normal operation (and our work volume is
> growing, so I can't just take a baseline).  I can't say that having X
> MANY locks is bad; but I can say with certainty that having any
> lock that is over a minute old is abnormal and BAD.   I'd like to
> be alerted of locks that persist for more than a minute, not
> of lots of locks that come and go quickly (which is OK).
>
> What do you think, would this be generally useful?
>
> Yours very truly,
> Aleksey


More information about the Check_postgres mailing list