[Bucardo-general] Complex Bucardo install reaching limits

Fri Jan 2 18:01:35 UTC 2015

Greg, et al,

Our bucardo install is pretty complex. During 2014 we essentially reached
the limit of what we can do. I know that the solution for our further
expansion is "in the works" but in the meantime I wanted to explain what
we're doing and see if you have any suggestions on overcoming our current
limitations.

In our environment we have a central database (an Amazon Postgres RDS, we
were the ones that pressured them into allowing session_replication_role
settings just so we could run bucardo ;) ). The central database drives the
main internet application system.

Each of our clients have a replica of the main database on a laptop. The
laptop goes with them to events. Outside events, the laptops are usually
powered down and stored, but during events the laptop is powered up and is
replicating back and forth with the internet server.

To solve this "connected/disconnected" problem we have set up the laptops
to each run their own bucardo server with their own client-specific syncs.
On each of them, the internet server tables are set to "makedelta" so that
changes from one client laptop will be replicated to the other client
laptops via the internet server when the client laptop is powered on.

In theory this works pretty well and in practice it also works pretty well
until we get to about 4 client laptops. At that point, lock contention over
slow updates and whatever other checking happens results in lots of
serialization errors (which usually eventually resolve on the 2nd or 3rd
try), and the somewhat powerful Amazon RDS database being pegged out at
100% utilization. I have very little confidence that this solution will
scale to more than 4 concurrent clients.

We're currently using 4.99.11 and on my task list is to update to current
head.

I believe that the proper solution to our issues is to switch around so
that we're running a single central bucardo process with a single set of
syncs. However for this to be successful the "in the works" database-down
detection would need to be complete, since we don't want a sync to stall or
fail on down database (because most databases will be down most of the
time). Instead, just to skip databases that are down, then catch them up
when they come online. I know that's not available yet.

In the meantime, are there any suggestions on how to overcome our current
scaling limitations? Would it be useful to change over to a central
burcardo server but still have client-specific syncs that continue to use
makedelta? I don't really see how that would solve the current contention
issue but maybe it would. Any other thoughts?

Chris.

-- 
*Chris Keane* * Track Intelligence Inc *  +1 (650) 703 5523 (cell)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20150102/259561bb/attachment.html>