[check_postgres] performance data

Thu Aug 25 15:53:06 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I'm working (with some of my Dalibo colleagues) on a nagios perfdata
collector (yes, another one :) ), this time storing data in a
PostgreSQL database.
If you're interested, it's there : https://github.com/dalibo/yang

Anyway, that's not directly the reason of this message. I'm facing a
problem with the way check_postgres provides performance data. Let me
explain (sorry if I'm explaining trivial things, I feel I'd rather be
thorough on this explanation):

Here's the output example provided in the Nagios documentation, for the
Ping plugin:

PING ok - Packet loss = 0%, RTA = 0.80 ms | percent_packet_loss=0,
rta=0.80

It's what's returned on STDOUT from the program.
Before the pipe is the plugin's message, after the pipe is the
perfdata. Here, we have 2 performance counters, percent_packet_loss
and rta. Note thas these are fixed names, so it is possible to match
those between 2 calls to this Ping plugin.

Here is an example using the query_time action from check_postgres:

POSTGRES_QUERY_TIME OK: (port=10864) longest query: -0s? ?
(database:testdb PID:12740 port:40412 address:192.168.22.31
username:testdb) | time=0.01s 'database:testdb PID:12740 port:40412
address:192.168.22.31 username:testdb'=-0;10;15

The problem is that we have a counter named «database:sylvea PID:12740
port:40412 address:192.168.22.31 username:sylvea». This name
potentially changes at every execution of the plugin, so there is no
way to graph it. It's not really a counter name, as it contains a part
of the data.

My first question is: does anybody agree that this is a problem, and
it should be fixed ? For instance, in our yang installation, we have
around 6000 counters, 5000 coming from counters like this one, whose
name change all the time, and can't be used anyway. So we purge each
of those regularly.

My second question is: how to solve this ? I could mangle with the
perfdata in yang, or have a filter somewhere, but I feel this would be
cleaner if done in check_postgres (and I'm willing to do so, of course
:) ).
If it is to be done in check_postgres, I see 2 possibilities (if we
keep on the query_time example):
- - return a fixed name for the counter. Something like
«longest_query_duration»
- - return more detailed counters. For instance, one counter per
database. But I don't think there would be a good reason to have
counters that detailed.

Cheers
Marc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5Wb+IACgkQe8Ikm1/HTa/4LACfYh2aZnKjxsQVK5B4wxWpcBAj
D4YAoIg7zrnFLjaC3l4QXyNIS8TrfNbj
=QCCS
-----END PGP SIGNATURE-----