[check_postgres] performance data

Mon Aug 29 20:26:52 UTC 2011

On Thu, 2011-08-25 at 17:53 +0200, Marc Cousin wrote:
> [...]
> I'm working (with some of my Dalibo colleagues) on a nagios perfdata
> collector (yes, another one :) ), this time storing data in a
> PostgreSQL database.
> If you're interested, it's there : https://github.com/dalibo/yang
> 
> Anyway, that's not directly the reason of this message. I'm facing a
> problem with the way check_postgres provides performance data. Let me
> explain (sorry if I'm explaining trivial things, I feel I'd rather be
> thorough on this explanation):
> 
> Here's the output example provided in the Nagios documentation, for the
> Ping plugin:
> 
> PING ok - Packet loss = 0%, RTA = 0.80 ms | percent_packet_loss=0,
> rta=0.80
> 
> It's what's returned on STDOUT from the program.
> Before the pipe is the plugin's message, after the pipe is the
> perfdata. Here, we have 2 performance counters, percent_packet_loss
> and rta. Note thas these are fixed names, so it is possible to match
> those between 2 calls to this Ping plugin.
> 
> Here is an example using the query_time action from check_postgres:
> 
> POSTGRES_QUERY_TIME OK: (port=10864) longest query: -0s? ?
> (database:testdb PID:12740 port:40412 address:192.168.22.31
> username:testdb) | time=0.01s 'database:testdb PID:12740 port:40412
> address:192.168.22.31 username:testdb'=-0;10;15
> 
> The problem is that we have a counter named «database:sylvea PID:12740
> port:40412 address:192.168.22.31 username:sylvea». This name
> potentially changes at every execution of the plugin, so there is no
> way to graph it. It's not really a counter name, as it contains a part
> of the data.
> 
> My first question is: does anybody agree that this is a problem, and
> it should be fixed ? For instance, in our yang installation, we have
> around 6000 counters, 5000 coming from counters like this one, whose
> name change all the time, and can't be used anyway. So we purge each
> of those regularly.
> 

I guess we do as it's already fixed :)

> My second question is: how to solve this ? I could mangle with the
> perfdata in yang, or have a filter somewhere, but I feel this would be
> cleaner if done in check_postgres (and I'm willing to do so, of course
> :) ).
> If it is to be done in check_postgres, I see 2 possibilities (if we
> keep on the query_time example):
> - return a fixed name for the counter. Something like
> «longest_query_duration»
> - return more detailed counters. For instance, one counter per
> database. But I don't think there would be a good reason to have
> counters that detailed.
> 

This is what current git master gives:

POSTGRES_QUERY_TIME OK:  longest query: 0s DB:  | query_time=0s;;10

Strange thing is that the webpage still states that the latest release
is 2.16.0 (and http://bucardo.org/check_postgres/latest_version.txt says
the same). Maybe it's time to get a new release out of the door.
Officially, I mean :)

-- 
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com