[Bucardo-general] Optimizing the swap sync process

Greg Sabino Mullane greg at endpoint.com
Tue Mar 10 02:12:26 UTC 2009


I'd like to outline one of of the ideas I had for improving swap syncs. Any
thoughts or feedback appreciated.

Right now, the swap sync gathers up all rows of interest on both the source and
target side by doing a left join of the bucardo_delta table against the table in
question. Then it goes through each row it finds, figures out what to do with
it, and finally makes the changes, by doing a delete, an update, or an insert,
as needed.

The first optimization is to get rid of the updates entirely and make it a
batched delete + COPY process, as pushdelta in git now is. That's mostly done at
this point.

However, the real optimization comes when we realize that for some systems,
actual conflicts are rare, and 99% of the time we will simply be doing inserts,
either source->target, or target->source, especially on systems in which the
primary keys are sequence based, and the sequences are staggered for the two
sides. If this is the case, we can avoid the left join altogether on the
assumption that no overlapping rows will be found. So, like the current
pushdelta, we simply get the list of distinct rows from bucardo_delta. Then, we
simply COPY the data rows over to the other side. (we don't even need to delete
first as pusdelta does). If the COPY fails, we fall back to the more traditional
approach. But if it works, we've cut out a *lot* of steps and probably gained
quite a speed advantage.

This will probably be a per-sync toggle, indicating that Bucardo should attempt
to optimize the sync in this way. Perhaps a per-table one as well.

Another sub-optimization is for Bucardo to start tracking the sequence numbers
itself, or at least the highest rowid replicated, for rows that use simple
incrementing numbers for their primary keys. Thus, Bucardo can know when it
should try the optimized "COPY only" approach, by treating rows higher than the
last known rowid as likely to be unique to that database. We could even have
Bucardo break the sync into two passes, one to COPY over the "new" rows, and the
second to handle any "not so new" rows in the traditional manner.

P.S. For the record, I'm also working on ways to make the "slaves" read-only,
with the use of triggers and rules. This is mostly done, and is extra-nice as
one will be able to indicate the entire DB should be read-only (e.g. triggers on
the tables not controlled by Bucardo). Also, you will be able to turn the
restriction on or off on a per-table basis.

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 226 bytes
Desc: OpenPGP digital signature
Url : https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20090309/ebf809d4/attachment.bin 


More information about the Bucardo-general mailing list