[check_postgres] Changes to last_autovacuum check
Josh Berkus
josh at agliodbs.com
Sat Jan 22 23:51:25 UTC 2011
All,
I was looking at last_autovacuum (and last_autoanalyze) and trying to
improve it; the current check seems to cause a lot of false positives by
reporting tables which haven't been updated in quite some time. Here's
my first stab at a query for "table has not been autovacuumed"; rather
than depending on time, it checks for tables whose ratio of dead rows is
more than 2X what autovacuum is supposed to tolerate:
SELECT schemaname, relname,
GREATEST(last_vacuum, last_autovacuum) as last_vacuumed
FROM pg_stat_user_tables,
( SELECT setting::numeric as scalefactor
FROM pg_settings
WHERE name = 'autovacuum_vacuum_scale_factor' )
as scalefactor,
( SELECT setting::numeric as threshold
FROM pg_settings
WHERE name = 'autovacuum_vacuum_threshold' )
as threshold
WHERE n_dead_tup > threshold
and
( ( n_dead_tup - threshold ) /
(CASE WHEN n_live_tup = 0
THEN 1 ELSE n_live_tup END ))
> ( scalefactor * 2 );
This obviously needs work; for example, it doesn't take into account
tables which might have custom autovacuum settings. The query above is
written with the idea of a time threshold being passed by nagios as a
warning and critical parameter, but I'm wondering if that's the best
way. Maybe we should report the ratio of dead rows instead, and put
warning/critical thresholds on that?
I also think the above approach could be more accurate than the old
bloat query for detecting bloated tables in 8.4 and later.
If this approach seems good to people, I'll work up a WIP patch to try out.
--Josh Berkus
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
More information about the Check_postgres
mailing list