[Bucardo-general] feedback from testing Bucardo for enterprise use

Mon Mar 29 15:02:05 UTC 2010

Thanks for the feedback! Some quick inline comments:

> 'member' table which is replicated back from each remote cluster to
> the centre, ensuring that the primary keys of the members do not
> conflict.

Have you thought about using staggered sequences to accomplish this?

> Problem 1: When a remote cluster becomes unavailable, other syncs fail.

This is on the TODO list. Specifically, I'm envisioning a flag to allow 
for replication to continue when a slave goes down, flipping it inside 
Bucardo to another status (perhaps "unreachable"), and then allowing 
it to flip back and catch up once connection is re-established. One danger 
is that if it never comes up, or takes a very long time, the bucardo_delta 
and bucardo_track tables will grow large, as we can't remove the rows until 
all slaves have the data.

> This is a deal-breaker for me, and I imagine any other serious users
> of this software. If Bucardo.pm could handle these errors more
> gracefully, and when one sync failed move on to the next, it would
> make a big difference.

If I understand your example correctly, it's not so much a graceful 
sync recovery that is needed, as a graceful slave recovery. Which could 
really be the same thing, if your syncs are set up that way.

> Problem 2: The more syncs a table is involved in, the longer any
> transactions on it take.
> Every sync adds an additional trigger to each table it's watching. If
> I have a dozen remote databases, these 12 triggers add noticable
> overhead to the times required to modify each record. Could the delta
> table simply be touched once for each record? This would speed up all
> updates on these tables, reduce the delta table size, and speed up
> delta table lookups, done during the syncs.

There should only be a single trigger, named "bucardo_add_delta", for each 
table. If the bucardo_delta table is getting 12 inserts per update, 
there is something seriously wrong. Each sync adds its own *notification* trigger, 
which is simply a statement-level trigger that fires a NOTIFY. You can turn this 
off by adjusting the "ping" parameter at the sync and/or table level. As long 
as you have ping enabled for at least one table involved in a sync, you'll 
get automatic kickoff of the syncs. The other option is to set a 
timeout (sync.checktime).

However, your use case is slightly different than that. Currently, 
there is no easy way to have the trigger notify multiple syncs in one fell 
swoop. You could probably make your own trigger easily enough that does 
12 different NOTIFYs at once. It's possible we could make a "kick all syncs" 
global NOTIFY trigger...other ideas welcome.

> Problem 3: Large sync operations take a lot of memory
> If a remote database is unavailable for extended periods, the amount
> of data to transfer will get pretty big. It appears that Bucardo
> doesn't pass along the data in chunks, but buffers it all in memory.

Yes, this is a known issue, and the solution is pretty much as you've said. 
The future idea is to pick some sort of cutoff, perhaps in number of 
delta rows, and if we go over that, start doing replication is smaller 
blocks delimited by the transaction times.

One way to get the memory back down is to look at the sync columns 
lifetime and maxkicks, to restart the controller and kids after a 
certain threshhold. Crude, but it sometimes helps clear up memory 
problems (until we can solve it properly).

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
Url : https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20100329/7b5b4a4b/attachment.bin