[Bucardo-general] size of q table

Wed Oct 27 04:50:56 UTC 2010

Hi, all,

Should the 'q' table get trimmed over time?

I've got a machine where:

 replication _is_ working (swap)
 replication latency is climbing
 the q table has 184,000-ish entries in it
 the bucardo database is at about 200M
 all entries in q have 'ended' times

I'm seeing log entries like:

  CTL Could not add to q sync=data_left,source=data_left,target=data_right,count=1. Sending manual notification

CPU and I/O usage is pretty intense for the bucardo processes.  If they are all doing a bunch of scans of the 'q' table, that would explain things.

Some bucardo processes wind up getting stuck in 'UPDATE waiting' for as long as 'status' shows State:WAIT.  strace'ing those processes shows:

  semop(8388735, 0x7fffd36c32a0, 1

gdb shows:

  #0  0x000000309b0d50e7 in semop () from /lib64/libc.so.6
  #1  0x000000000053e753 in PGSemaphoreLock ()
  #2  0x0000000000563e61 in ProcSleep ()
  #3  0x0000000000562b57 in LockAcquire ()
  #4  0x00000000005612c4 in XactLockTableWait ()
  #5  0x0000000000451f03 in heap_update ()
  #6  0x00000000004ed1f4 in ExecutorRun ()
  #7  0x000000000056ed92 in ?? ()
  #8  0x000000000056f5a5 in PortalRun ()
  #9  0x000000000056b563 in ?? ()
  #10 0x000000000056cdff in PostgresMain ()

So, it's waiting for some lock that never frees.  I can kill those processes by hand and replication catches itself up pretty well, but obviously that's not the proper order of things.

Suggestions for where to look next?

Thanks,
-Bill

-- 
Bill McGonigle, Owner   
BFC Computing, LLC       
http://bfccomputing.com/ 
Telephone: +1.603.448.4440
Email, IM, VOIP: bill at bfccomputing.com           
VCard: http://bfccomputing.com/vcard/bill.vcf
Social networks: bill_mcgonigle/bill.mcgonigle