[Bucardo-general] KID dies with 'could not serialize access due to concurrent update' (prevents replication)

Wed Dec 7 08:37:50 UTC 2011

On 12/07/2011 02:06 AM, Bill McGonigle wrote:
> I did the usual debugging of stopping/starting bucardo, validating the syncs, even restarting postgresql to be certain.  System logs, disk, etc. all look good.

I might have found a cause of this (though I don't know why) - high
uptime.  Two things got replication running again (at least for the time
being?):

1) using iptables to wall off clients (tomcat apps), restarting
postgresql and bucardo.  At this point, all the queued data on both
masters went (and without any problems that I can see).  But when I
re-enabled access for the clients, I again got kids dying with
'concurrent update' problems.  What could be contending?

2) rebooting the virtual machines the databases run on.  The VM with
bucardo on it had 405 days of uptime.  I wonder if somewhere in the
linux/postgresql/perl/bucardo stack there might be a data-type overflow
on the system clock.  Probably not linux since the error looks to be at
the application-level.

I don't have a theory that satisfies both of those data points to my
satisfaction.

-Bill

-- 
Bill McGonigle, Owner
BFC Computing, LLC
http://bfccomputing.com/
Telephone: +1.855.SW.LIBRE
Email, IM, VOIP: bill at bfccomputing.com
VCard: http://bfccomputing.com/vcard/bill.vcf
Social networks: bill_mcgonigle/bill.mcgonigle