[Bucardo-general] Design of new txntime field
Greg Sabino Mullane
greg at endpoint.com
Sun Sep 10 19:57:21 UTC 2017
On Thu, Sep 07, 2017 at 07:47:24PM -0600, Jon Jensen wrote:
> Remind me: Do we need to use the transaction start timestamp, that is,
> now()? Or should we be using clock_timestamp() to get the wallclock time at
> each invocation?
We don't *need* the start time, but it is greatly preferred as it
reduces the size of the track table entries (which populates via
a SELECT DISTINCT txntime), makes it easy to group the deltas
by eyeballing them, and decreases the chance of duplicates (but see
below).
> Postgres doesn't offer any higher-resolution timestamp, does it? I thought
> microsecond was it.
>
> I suppose you could ask the OS for a nanosecond time, and pass it in.
Yes, that's as good as it gets. We can't ask the OS for that: too expensive
for the triggers probably.
> That won't sort unless you pad txid_current(), right? As a stringified
> integer with varying digits, it'll depend on number of digits.
>
> And actually the timestamp won't sort right as a string either, since it
> will have fewer digits sometimes when the microseconds end with 0 and are
> truncated, leaving fewer string digits.
Excellent points, I'm glad you spotted that. I've been playing with this today,
and we can run now() back through to_char to get the full microseconds
each time like this:
select to_char(now(), 'YYYY-MM-DD HH24:MI:SS.US');
Then I started looking at formatting the txnid_current() output. Problem is,
it's a bigint, which means quite a lot of padding for most, if not all,
standard installations. Then I had another idea - what about the PID? Those
are distributed sequentially or randomly, but all that really matters is
that two backends who happen to have the exact same now() have diferent
PIDs - which should always happen. Then the question of how big to make
the string - PIDs can be forced quite high with some effort, but for purposes
of coliision avoidance, we probably only need some of the digits. So I came
up with:
select to_char(right(pg_backend_pid()::text, 4)::int, 'FM00000');
While PIDs can be negative on some systems, that should not be a problem
either - it still makes a good tie breaker.
We can also compress things a little - we want to keep the timestamp human
readable, but we don't need all the whitespace that now() gives by default.
This, the final proposal becomes:
select to_char(now(), 'YYYYMMDD.HH24MISS.US.')::text || to_char(right(pg_backend_pid()::text, 5)::int,'FM00000');
The output looks like this:
20170910.143526.395762.04482
That's 28 characters, which is not too bad overall. And it is still human readable,
and sortable.
So how does it do in the criteria I made for the new txntime?
* No collisions
The addition of the last five of the current PID should avoid collisions.
In theory, there may still be collisions - which we can reduce further by
going to 6 or 7 digits, but is that really worth it? On a system using
6 or more digits, you'd have to have two backends created exactly
100,000 PIDs apart (if sequential), or happening to have the same last
five digits (if random), a 9 in 10,0000 chance, or 99 in 10,000.
Think I just talked myself into 6 digits. :)
* Human readable
Not quite as intuitive as the existing timestamp, but fairly obvious and
easy to read. And we can document the right-hand-side of it. Might be
worth it to force it all to GMT as well.
* Inexpensive
We went from a simple now() call to now() + pg_backend_pid() + three
explicit casts + right() plus two to_char()s. Still, those are all
really, really cheap. Since these can be called a *lot*, we don't
want this to be too expensive
* Disk space
We want the delta tables to be as small as possible. Right now, this
solution is a lot more space than a plain timestamp, but I don't
see an easy way around this problem right now. I've made some improvements
to the bucardo_purge_delta() functions that may offset this problem a
little bit, by making it possible to keep the deltas trimmed after
use as quickly as possible.
* Easy upgrade
This new solution can coexist as is with the old delta column, and
conversion is simple - just pad things out with zeroes.
* Sortable
Well, that's the reason we go though so much trouble with right(), to_char(),
etc.! :)
On rereading this, maybe we don't need the microseconds anymore either, as
the PID chunk should be enough to distinguish things? Or at least we could
do milliseconds, and save a few characters.
--
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 2529 DF6A B8F7 9407 E944 45B4 BC9B 9067 1496 4AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20170910/06f33955/attachment.sig>
More information about the Bucardo-general
mailing list