[Bucardo-general] My locks overfloweth
Mitchell Perilstein
Mitchell.Perilstein at trueposition.com
Fri Apr 12 20:15:49 UTC 2013
Has anyone seen this before?
We're using pg 9.1.5 on solaris 10, perl 5.8.4, and bucardo 4.99.7. We
have two boxes in a master-master swap sync, conflict_strategy=latest,
comprising around 39 tables in 3 databases of roughly 50k rows in all.
We have it divided into 3 dbgroups and 4 syncs total for convenience.
With some data in both boxes' databases, mostly in sync and our app
layer turned off (no other db clients), we can start bucardo and end up
with locks looking something like this:
# psql -Upostgres -c "select pid,count(pid) from pg_locks group by
pid order by count(pid)"
pid | count
------+-------
7404 | 2
6947 | 16
6942 | 32
6930 | 74
6855 | 84
6940 | 1678
(6 rows)
That's 1600 locks just for bucardo. Notably this is only on one box;
the other bucardo is using around 400 locks to perform the same sync.
Looking for culprit queries above, that pid 6940 is mostly idle on
different tables with no query:
# psql -Upostgres -c "select database, relation, pid, mode,
current_query from pg_locks join pg_stat_activity on (pid=procpid)"
database | relation | pid | mode | current_query
----------+----------+------+------------------+----------------------
27512 | 28671 | 6940 | SIReadLock | <IDLE>
27512 | 28524 | 6940 | SIReadLock | <IDLE>
27512 | 28562 | 6940 | SIReadLock | <IDLE>
27512 | 28671 | 6940 | SIReadLock | <IDLE>
27512 | 28726 | 6940 | SIReadLock | <IDLE>
27512 | 28634 | 6940 | SIReadLock | <IDLE>
.... etc ....
Now, if I start our app which will begin doing writes on both boxes,
we'll see transient activity with our queries and with bucardo doing its
COPY ... STDIN work but those queries do their work and go away. The
lock count will rise over several minutes up to 20,000 or 30,000 on one
box in similar manner. Around that point we'll start to hit this:
(28706) [Fri Apr 12 18:28:54 2013] KID New kid, sync "o1_sync"
alive=1 Parent=27035 PID=28706 kicked=1
(28706) [Fri Apr 12 18:28:54 2013] KID DBD::Pg::db pg_result failed:
ERROR: out of shared memory HINT: You might need to increase
max_pred_locks_per_transaction. at /tpapp/tpdb/lib/perl5/Bucardo.pm
line 3140. Line: 4801 Main DB state: ? Error: none DB source_ossgw
state: 53200 Error: 7 DB target_ossgw state: ? Error: none
(28706) [Fri Apr 12 18:28:54 2013] KID Kid 28706 exiting at
cleanup_kid. Sync "o1_sync" public.commonobjects Reason: DBD::Pg::db
pg_result failed: ERROR: out of shared memory HINT: You might need
to increase max_pred_locks_per_transaction. at
/tpapp/tpdb/lib/perl5/Bucardo.pm line 3140. Line: 4801 Main DB
state: ? Error: none DB source_ossgw state: 53200 Error: 7 DB
target_ossgw state: ? Error: none
and things start failing on random tables and queries everywhere for all
clients. We've tried bumping the pg lock settings to no avail. We'd
like to understand the lock usage which we're assuming is root cause here.
Any ideas appreciated. Thanks!
Confidentiality Notice: This e-mail (including any attachments) is intended only for the recipients named above. It may contain confidential or privileged information and should not be read, copied or otherwise used by any other person. If you are not a named recipient, please notify the sender of that fact and delete the e-mail from your system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20130412/ffc09a36/attachment.html>
More information about the Bucardo-general
mailing list