[Bucardo-general] Design of new txntime field
Greg Sabino Mullane
greg at endpoint.com
Fri Nov 6 13:43:08 UTC 2015
Recent emails to the list have pointed out a shortcoming in the
way we track replicated rows. Currently, when a row is changed,
a trigger adds a rows to the table's "delta" table, containing
the primary key that was changed, and the current_timestamp of
when the txn was started (txntime). Similarly, the "track" table uses
a timestamptz field named "txntime" to match up with entries in
the delta table - when a delta row has the same number of matching
entries in the track table as number of syncs replicating to
the source table, we know we can remove the delta row.
The problem is that, even with microsecond resolution, there is
still a chance that two backends could modify the table at
the same time - or at least, get the exact same time recorded
as their transaction start time (which is really time of first
query). This is not good, as the txntime is no longer unique and
no loger suitable for use as a (sort of) primary key.
We could store more precision in the timestamp, but not only are
we already limited in what can be stored via current_timestamp, we
start having to worry about compile-time settings.
My idea is to use the PID of the backend as well. In theory, this
should be unique when combined with the timestamp, because even
if two backends manage to grab the exact same timestamp, they will always
be different PIDs. Another process may come along and recycle one
of those PIDs, but that will not matter as the timestamp will have
changed.
The goals of the txntime column are:
* unique
* human-readable timestamp
* sortable (esp. for conflict handling)
One solution is to change it to a text field, then dump the formatted
timestamp plus the PID:
greg=# select to_char(current_timestamp,'YYYY-MM-dd.HH24:MI:US ') || pg_backend_pid();
?column?
-------------------------------
2015-11-06.08:35:460738 10638
This seems a pretty good solution, as it meets all the criteria. Having the PID
at the end means conflict resolution continues to work as well. Although it
means two servers with the exact same timestamp update will have a
non-deterministic conflict winner, the chance of the same timestamp is
extremely low, and in such a case there truly is no way to pick a
winner.
One concern is that the indexes on such a field will not be very optimal -
we could store the PID and/or the seconds first to create a wider tree,
but then we lose the sorting. I'm not convinced it will be much of a
problem, really, but wanted to throw it out there.
Feedback welcome.
--
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20151106/4e6ca299/attachment.sig>
More information about the Bucardo-general
mailing list