NAME

pgsi.pl - Produce system impact reports for a PostgreSQL database.

VERSION

This documentation refers to version 1.7.1

USAGE

pgsi.pl [options] < pglog_slice.log

 or...

pgsi.pl --file pglog_slice.log [options]

Options

 --file
 --query-types
 --top-10
 --all
 --pg-version
 --offenders

DESCRIPTION

System Impact (SI) is a measure of the overall load a given query imposes on a server. It is expressed as a percentage of a query's average duration over its average interval between successive calls. E.g., SI=80 indicates that a given query is active 80% of the time during the entire log interval. SI=200 indicates the query is running twice at all times on average. Thus, the lower the SI, the better.

The goal of SI is to identify those queries most likely to cause performance degradation on the database during heaviest traffic periods. Focusing exclusively on the least efficient queries can hide relatively fast-running queries that saturate the system more because they are called far more frequently. By contrast, focusing only on the most-frequently called queries will tend to emphasize small, highly optimized queries at the expense of slightly less popular queries that spend much more of their time between successive calls in an active state. These are often smaller queries that have failed to be optimized and punish a system severely under heavy load.

One thing SI does not do is distinguish between high-value queries represented by extended active states or long durations due to blocking locks. Either condition is worthy of attention, but determining which is to blame will require independent investigation.

Queries are canonized with placeholders representing literals or arguments. Further, IN lists are canonized so that variation from query to query only in the number of elements in the IN list will not be treated as distinct queries.

Some examples of the "same" query:

 SELECT col FROM table WHERE code = 'One';
 SELECT col FROM table WHERE code = 'Sixty-Three';

 SELECT foo FROM bar WHERE fuzz = $1 AND color IN ('R','G','B');
 Select FOO
 from bar
 WhErE fuzz = '56'
     AND color IN ('R', $1);

Differences in capitalization and whitespace are irrelevant.

Log Data

Pass in log data on stdin:

    pgsi.pl < some_log_slice.log
    cat some_log_slice.log | pgsi.pl

Or use the --file option:

    pgsi.pl --file=some_log_slice.log

Or read in more than one file at a time:

    pgsi.pl --file=logfile1.log --file=logfile2.log

If more than one file is given, they must be given in chronological order.

Log data must comply with a specific format and must be from contiguous activity. The code makes the assumption that the overall interval of activity is the time elapsed between the first and last log entries. If there are several blocks of logs to analyze, they must be run separately.

Required format is the following in syslog:

YYYY-MM-DDTHH24:MI:SS(-TZ:00)? server postgres[pid]:

This also requires that log_statement is set to 'all' and that log_duration be set to 'on' in postgresql.conf. If you are not using syslog, you can simulate the format with the following:

log_line_prefix = '%t %h postgres[%p]: [%l-1] ' ## Simulate syslog for pgsi.

Options

--query-types

Query impact is segregated by types. I.e., all the SELECTs together, all UPDATEs together, etc. Typically it is assumed that SELECT is the most interesting (and is by itself the default), but any query type may be analyzed. Multiples are provided as space- or comma-separated lists.

    pgsi.pl --query-types="select, update, copy, create"

The code will produce a unique report for each type when used with the --all and/or --top-10 file-pattern options (see below).

--top-10, --all

Supplies a file pattern and optional directory path into which the reports should be written per --query-type. The pattern is prefixed with the --query-type and host for this report and placed into the requested directory (or cwd if no path is present).

--all will list every canonized query encountered, which is likely to contain a large number of queries of no interest (those with negligible impact).

--top-10 limits the report to only the 10 entries with the greatest SI.

    pgsi.pl \
        --query-types=select,update \
        --all=si_reports/monday_10am_1pm.all.txt \
        --top-10=si_reports/monday_10am_1pm.t10.txt

This will produce the following reports in si_reports/ for a database running on server db1:

    SELECT-db1-monday_10am_1pm.all.txt
    UPDATE-db1-monday_10am_1pm.all.txt
    SELECT-db1-monday_10am_1pm.t10.txt
    UPDATE-db1-monday_10am_1pm.t10.txt

If --top-10 is not supplied, then no top 10 report is generated. If --all is not supplied, then the report(s) print to stdout.

--pg-version

Currently, this might better be described as either "before DETAIL" or "after DETAIL". The code was written against PG 8.1 originally, but when 8.2 came out the addition of DETAIL log entries forced a different parser. That unfortunate timing led to the assumption that log construction would change with each release. Going forward, --pg-version will be (other than 8.1) the first version in which this log format was encountered.

--pg-version is only either 8.1 or 8.2 (8.2 is default). It's unknown how far back in versions the 8.1 format holds, but 8.2 holds for itself and 8.3. So, unless you're working against logs generated by a PG version less than 8.2, you do not need to include this option (but it might save you some trouble if a new format comes at a later version and the default bumps up to the most recent while you stay on your older version).

    pgsi.pl --pg-version=8.1

--offenders

Number of best and worst queries to included with the report, in terms of overall duration of execution. Enough log information is listed along with the duration such that tracking down the original query (not the canonized version) is straightforward. The offenders list can be very useful for a query that is causing trouble in a handful of permutations, but most of the time is behaving well.

The list in conjunction with standard deviation gives an overall indication of performance volatility.

--offenders=5 produces additional output in the report that looks something like the following example:

 Best
   1. 2009-01-12T10:11:49-07:00 db1 postgres[4692] -- 4.833 ms
   2. 2009-01-12T10:31:19-07:00 db1 postgres[1937] -- 4.849 ms
   3. 2009-01-12T09:16:20-07:00 db1 postgres[20294] -- 4.864 ms
   4. 2009-01-12T10:16:54-07:00 db1 postgres[20955] -- 4.867 ms
   5. 2009-01-12T10:32:16-07:00 db1 postgres[5010] -- 4.871 ms

 Worst
   1. 2009-01-12T10:00:07-07:00 db1 postgres[2804] -- 2175.650 ms
   2. 2009-01-12T09:30:07-07:00 db1 postgres[2804] -- 2090.914 ms
   3. 2009-01-12T10:00:18-07:00 db1 postgres[2804] -- 2046.608 ms
   4. 2009-01-12T09:30:10-07:00 db1 postgres[2804] -- 1954.604 ms
   5. 2009-01-12T11:20:11-07:00 db1 postgres[2804] -- 1788.576 ms

BUGS

If queries contain exceptionally long IN lists, the regex that attempts to flatten them can run into a perl recursion limit. In that event, the query will keep the placeholders of the IN list, making it unique compared to the same query with a different cardinality of list params in the same IN. This deficiency should only surface on IN lists with composite parameters [e.g., IN ((?,?,...,?),(?,?,...,?),...,(?,?,...,?))]. For scalar IN lists, there should be no such limit.

AUTHOR

Original code: Mark Johnson (mark@endpoint.com), End Point Corp.

Contributions: Ethan Rowe (ethan@endpoint.com), End Point Corp. Greg Sabino Mullane (greg@endpoint.com), End Point Corp. Daniel Browning (db@endpoint.com), End Point Corp. Joshua Tolley <josh@endpoint.com>, End Point Corp. Abraham Ingersoll <abe@abe.us>

LICENSE AND COPYRIGHT

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See the LICENSE file.