[Bucardo-general] Design of new txntime field

Greg Sabino Mullane greg at endpoint.com
Thu Sep 7 14:27:37 UTC 2017

I'm working on some major architectural changes to Bucardo, and this issue has 
popped up again. Quick background of the problem:

Bucardo works by knowing which rows in a table have changed since the last 
time it ran. Thus, we need to store the primary keys - this is done in the 
delta table. It also has a "txntime" next to each primary key column. This 
is used for two purposes:

* As a unique identifier for the delta. It is too expensive to have the main 
process remove the deltas right away, so it is done at a later time. We do 
mark the row as done for this sync, however, by adding all the distinct 
txntimes from the delta table we just processed to the track table. Other 
syncs can thus ignore those rows by looking at the track table. This also 
allows for the case where a table is replicated by more than one sync - so 
we cannot remove a delta until all the syncs have processed it

* The other purpose is for conflict resolution. The timestamp, in adition to 
being unique, is a handy way of knowing when the row changed, which we can 
use to determine which server "wins" in case the same primary key was changed. 
This use case is much less used.

So the problem is that timestamp has a maximum precision of 6, which in rare 
cases can lead to two processes generating the same value for now(), and thus 
creating identical rows in bucardo_delta - which really confuses Bucardo, as 
it expects txntime to be unique.

One possible solution is a bigint, tied to a sequence. This gets around the 
collision problem, but as we must call nextval() inside the trigger, we lose 
the many-to-one benefit of the same number for a process updating many rows 
(e.g. the track table will have to store a lot more distinct 'txntime' rows). 
Also, no more time-based conflict resolution. And calling nextval() is probably 
less perfomant than using now().

Another is to use txnid_current(), which basically is a unique bigint that 
will be the same inside a transaction ( much like now() ). The disadvantages 
are the loss of timestamp, and the worry that things could get wrapped around. 
That seems small, however, and the docs indicate there is an 'epoch' padding, 
although I'm not clear on how/where that gets applied.

Other than that, I'm not sure. The ideal is something unique, that gives a 
time hint of some kind, and that doesn't take up a lot of space (as delta 
tables can get quite big quite quickly). Any ideas? A higher resolution 
timestamp would work, or combining the timestamp with something else to 
prevent collisions, but those will make the txntime column larger.

Thus, the current best idea I have is to combine the timestamp 
with txid_current, which basically adds some precision to the 
timestamp, e.g. now() || txid_current()::text

This means we have to store it as a text, and not a timestamp field, 
but it still sorts, and we don't do any other time-based queries on 
the field, other than the vac process, but that can be changed.

Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 2529 DF6A B8F7 9407 E944  45B4 BC9B 9067 1496 4AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20170907/c46e7bfa/attachment.sig>

More information about the Bucardo-general mailing list