[Bucardo-general] Design of new txntime field

Greg Sabino Mullane greg at endpoint.com
Fri Nov 6 13:43:08 UTC 2015


Recent emails to the list have pointed out a shortcoming in the 
way we track replicated rows. Currently, when a row is changed, 
a trigger adds a rows to the table's "delta" table, containing 
the primary key that was changed, and the current_timestamp of 
when the txn was started (txntime). Similarly, the "track" table uses 
a timestamptz field named "txntime" to match up with entries in 
the delta table - when a delta row has the same number of matching 
entries in the track table as number of syncs replicating to 
the source table, we know we can remove the delta row.

The problem is that, even with microsecond resolution, there is 
still a chance that two backends could modify the table at 
the same time - or at least, get the exact same time recorded 
as their transaction start time (which is really time of first 
query). This is not good, as the txntime is no longer unique and 
no loger suitable for use as a (sort of) primary key.

We could store more precision in the timestamp, but not only are 
we already limited in what can be stored via current_timestamp, we 
start having to worry about compile-time settings.

My idea is to use the PID of the backend as well. In theory, this 
should be unique when combined with the timestamp, because even 
if two backends manage to grab the exact same timestamp, they will always 
be different PIDs. Another process may come along and recycle one 
of those PIDs, but that will not matter as the timestamp will have 
changed.

The goals of the txntime column are:

* unique
* human-readable timestamp
* sortable (esp. for conflict handling)

One solution is to change it to a text field, then dump the formatted 
timestamp plus the PID:

greg=# select to_char(current_timestamp,'YYYY-MM-dd.HH24:MI:US ') || pg_backend_pid();
           ?column?            
-------------------------------
 2015-11-06.08:35:460738 10638

This seems a pretty good solution, as it meets all the criteria. Having the PID 
at the end means conflict resolution continues to work as well. Although it 
means two servers with the exact same timestamp update will have a 
non-deterministic conflict winner, the chance of the same timestamp is 
extremely low, and in such a case there truly is no way to pick a 
winner.

One concern is that the indexes on such a field will not be very optimal - 
we could store the PID and/or the seconds first to create a wider tree, 
but then we lose the sorting. I'm not convinced it will be much of a 
problem, really, but wanted to throw it out there.

Feedback welcome.

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20151106/4e6ca299/attachment.sig>


More information about the Bucardo-general mailing list