[Bucardo-general] Design of new txntime field

Ioana Danes ioanadanes at gmail.com
Fri Nov 6 14:22:34 UTC 2015

Hi Greg,

On Fri, Nov 6, 2015 at 8:43 AM, Greg Sabino Mullane <greg at endpoint.com>

> Recent emails to the list have pointed out a shortcoming in the
> way we track replicated rows. Currently, when a row is changed,
> a trigger adds a rows to the table's "delta" table, containing
> the primary key that was changed, and the current_timestamp of
> when the txn was started (txntime). Similarly, the "track" table uses
> a timestamptz field named "txntime" to match up with entries in
> the delta table - when a delta row has the same number of matching
> entries in the track table as number of syncs replicating to
> the source table, we know we can remove the delta row.
> The problem is that, even with microsecond resolution, there is
> still a chance that two backends could modify the table at
> the same time - or at least, get the exact same time recorded
> as their transaction start time (which is really time of first
> query). This is not good, as the txntime is no longer unique and
> no loger suitable for use as a (sort of) primary key.
> We could store more precision in the timestamp, but not only are
> we already limited in what can be stored via current_timestamp, we
> start having to worry about compile-time settings.
> My idea is to use the PID of the backend as well. In theory, this
> should be unique when combined with the timestamp, because even
> if two backends manage to grab the exact same timestamp, they will always
> be different PIDs. Another process may come along and recycle one
> of those PIDs, but that will not matter as the timestamp will have
> changed.
> The goals of the txntime column are:
> * unique
> * human-readable timestamp
> * sortable (esp. for conflict handling)
> One solution is to change it to a text field, then dump the formatted
> timestamp plus the PID:
> greg=# select to_char(current_timestamp,'YYYY-MM-dd.HH24:MI:US ') ||
> pg_backend_pid();
>            ?column?
> -------------------------------
>  2015-11-06.08:35:460738 10638
> This seems a pretty good solution, as it meets all the criteria. Having
> the PID
> at the end means conflict resolution continues to work as well. Although it
> means two servers with the exact same timestamp update will have a
> non-deterministic conflict winner, the chance of the same timestamp is
> extremely low, and in such a case there truly is no way to pick a
> winner.
> One concern is that the indexes on such a field will not be very optimal -
> we could store the PID and/or the seconds first to create a wider tree,
> but then we lose the sorting. I'm not convinced it will be much of a
> problem, really, but wanted to throw it out there.
> Feedback welcome.
In our database, we have 2 big tables and one of them that has about 300
million records has an index on a text field that has 16-20 characters.
When I have to restore a backup of the database that index takes 10 times
longer than any other index. I know the delta and track table won't get
that big so it might not be a problem for this app.
How about using 2 fields in the primary key. One the current txntime field
and one a serial id field.

Thanks for looking into this.


> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 0x14964AC8
> _______________________________________________
> Bucardo-general mailing list
> Bucardo-general at bucardo.org
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20151106/095a01ad/attachment.html>

More information about the Bucardo-general mailing list