[Bucardo-general] Design of new txntime field

Greg Sabino Mullane greg at endpoint.com
Sun Sep 10 19:57:21 UTC 2017

On Thu, Sep 07, 2017 at 07:47:24PM -0600, Jon Jensen wrote:
> Remind me: Do we need to use the transaction start timestamp, that is,
> now()? Or should we be using clock_timestamp() to get the wallclock time at
> each invocation?

We don't *need* the start time, but it is greatly preferred as it 
reduces the size of the track table entries (which populates via 
a SELECT DISTINCT txntime), makes it easy to group the deltas 
by eyeballing them, and decreases the chance of duplicates (but see 

> Postgres doesn't offer any higher-resolution timestamp, does it? I thought
> microsecond was it.
> I suppose you could ask the OS for a nanosecond time, and pass it in.

Yes, that's as good as it gets. We can't ask the OS for that: too expensive 
for the triggers probably.

> That won't sort unless you pad txid_current(), right? As a stringified
> integer with varying digits, it'll depend on number of digits.
> And actually the timestamp won't sort right as a string either, since it
> will have fewer digits sometimes when the microseconds end with 0 and are
> truncated, leaving fewer string digits.

Excellent points, I'm glad you spotted that. I've been playing with this today, 
and we can run now() back through to_char to get the full microseconds 
each time like this:

select to_char(now(), 'YYYY-MM-DD HH24:MI:SS.US');

Then I started looking at formatting the txnid_current() output. Problem is, 
it's a bigint, which means quite a lot of padding for most, if not all, 
standard installations. Then I had another idea - what about the PID? Those 
are distributed sequentially or randomly, but all that really matters is 
that two backends who happen to have the exact same now() have diferent 
PIDs - which should always happen. Then the question of how big to make 
the string - PIDs can be forced quite high with some effort, but for purposes 
of coliision avoidance, we probably only need some of the digits. So I came 
up with:

select to_char(right(pg_backend_pid()::text, 4)::int, 'FM00000');

While PIDs can be negative on some systems, that should not be a problem 
either - it still makes a good tie breaker.

We can also compress things a little - we want to keep the timestamp human 
readable, but we don't need all the whitespace that now() gives by default. 
This, the final proposal becomes:

select to_char(now(), 'YYYYMMDD.HH24MISS.US.')::text || to_char(right(pg_backend_pid()::text, 5)::int,'FM00000');

The output looks like this:


That's 28 characters, which is not too bad overall. And it is still human readable, 
and sortable.

So how does it do in the criteria I made for the new txntime?

* No collisions
The addition of the last five of the current PID should avoid collisions. 
In theory, there may still be collisions - which we can reduce further by 
going to 6 or 7 digits, but is that really worth it? On a system using 
6 or more digits, you'd have to have two backends created exactly 
100,000 PIDs apart (if sequential), or happening to have the same last 
five digits (if random), a 9 in 10,0000 chance, or 99 in 10,000.
Think I just talked myself into 6 digits. :)

* Human readable
Not quite as intuitive as the existing timestamp, but fairly obvious and 
easy to read. And we can document the right-hand-side of it. Might be 
worth it to force it all to GMT as well.

* Inexpensive
We went from a simple now() call to now() + pg_backend_pid() + three 
explicit casts + right() plus two to_char()s. Still, those are all 
really, really cheap. Since these can be called a *lot*, we don't 
want this to be too expensive

* Disk space
We want the delta tables to be as small as possible. Right now, this 
solution is a lot more space than a plain timestamp, but I don't 
see an easy way around this problem right now. I've made some improvements 
to the bucardo_purge_delta() functions that may offset this problem a 
little bit, by making it possible to keep the deltas trimmed after 
use as quickly as possible.

* Easy upgrade
This new solution can coexist as is with the old delta column, and 
conversion is simple - just pad things out with zeroes.

* Sortable
Well, that's the reason we go though so much trouble with right(), to_char(), 
etc.! :)

On rereading this, maybe we don't need the microseconds anymore either, as 
the PID chunk should be enough to distinguish things? Or at least we could 
do milliseconds, and save a few characters.

Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 2529 DF6A B8F7 9407 E944  45B4 BC9B 9067 1496 4AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20170910/06f33955/attachment.sig>

More information about the Bucardo-general mailing list