[Bucardo-general] Design of new txntime field
Greg Sabino Mullane
greg at endpoint.com
Thu Sep 7 14:27:37 UTC 2017
I'm working on some major architectural changes to Bucardo, and this issue has
popped up again. Quick background of the problem:
Bucardo works by knowing which rows in a table have changed since the last
time it ran. Thus, we need to store the primary keys - this is done in the
delta table. It also has a "txntime" next to each primary key column. This
is used for two purposes:
* As a unique identifier for the delta. It is too expensive to have the main
process remove the deltas right away, so it is done at a later time. We do
mark the row as done for this sync, however, by adding all the distinct
txntimes from the delta table we just processed to the track table. Other
syncs can thus ignore those rows by looking at the track table. This also
allows for the case where a table is replicated by more than one sync - so
we cannot remove a delta until all the syncs have processed it
* The other purpose is for conflict resolution. The timestamp, in adition to
being unique, is a handy way of knowing when the row changed, which we can
use to determine which server "wins" in case the same primary key was changed.
This use case is much less used.
So the problem is that timestamp has a maximum precision of 6, which in rare
cases can lead to two processes generating the same value for now(), and thus
creating identical rows in bucardo_delta - which really confuses Bucardo, as
it expects txntime to be unique.
One possible solution is a bigint, tied to a sequence. This gets around the
collision problem, but as we must call nextval() inside the trigger, we lose
the many-to-one benefit of the same number for a process updating many rows
(e.g. the track table will have to store a lot more distinct 'txntime' rows).
Also, no more time-based conflict resolution. And calling nextval() is probably
less perfomant than using now().
Another is to use txnid_current(), which basically is a unique bigint that
will be the same inside a transaction ( much like now() ). The disadvantages
are the loss of timestamp, and the worry that things could get wrapped around.
That seems small, however, and the docs indicate there is an 'epoch' padding,
although I'm not clear on how/where that gets applied.
Other than that, I'm not sure. The ideal is something unique, that gives a
time hint of some kind, and that doesn't take up a lot of space (as delta
tables can get quite big quite quickly). Any ideas? A higher resolution
timestamp would work, or combining the timestamp with something else to
prevent collisions, but those will make the txntime column larger.
Thus, the current best idea I have is to combine the timestamp
with txid_current, which basically adds some precision to the
timestamp, e.g. now() || txid_current()::text
This means we have to store it as a text, and not a timestamp field,
but it still sorts, and we don't do any other time-based queries on
the field, other than the vac process, but that can be changed.
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 2529 DF6A B8F7 9407 E944 45B4 BC9B 9067 1496 4AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 163 bytes
Desc: not available
More information about the Bucardo-general