[Bucardo-general] Fwd: Syncing smaller batches

Ioana Danes ioanadanes at gmail.com
Thu Nov 5 16:23:44 UTC 2015


Hi Greg,

On Thu, Nov 5, 2015 at 9:57 AM, Greg Sabino Mullane <greg at endpoint.com>
wrote:

> On Wed, Nov 04, 2015 at 02:37:58PM -0500, Ioana Danes wrote:
> > Ok the query is not stuck it is just very slow. It could be networking
> > problem or client problem. The 2 servers are on the same msa though.
> >
> > I can reproduce the problem quite easily if I stop the sync for few
> minutes
> > so I could collect around 400,000 records to be synced (~ 200,000 for
> each
> > of my 2 tables).
> >
> > The query is :
> > SELECT  DISTINCT "testtableid"
> > FROM    bucardo.delta_abrazo_testtable d
> >                      WHERE   NOT EXISTS
> > (
> >                                 SELECT
> > 1
> >                                 FROM   bucardo.track_abrazo_testtable t
> >                                 WHERE  d.txntime = t.txntime
> >                                 AND    (t.target = 'dbgroup
> > sync_all1'::text)
> >
> > )
> >
> > A regular sync operation at 780 clients takes 2-3 seconds
> > Then I start the sync script and the first few syncs are slower than
> normal
> > (< 75 secs) because they have a lot to sync but then the third or fourth
> > takes even longer ~ 270 seconds.
>
> Thanks for your work on hunting this down. We need to determine next if
> it is the query itself that is slow or something else. The Bucardo logs
> should help (even better, running with log_level=debug). You could also
> set log_min_duration_statement = 100 in postgresql.conf and see if
> the above query shows up.
>

I had log_min_duration_statement set at 2 seconds and the time for the
statement was logged at something over 200  seconds, I don't remember
exactly but I know for sure that was two hundred and so.
I will enable the debugging and do more tests.

In meantime I got over another issue where couple of records were not
replicated for each of my big tables. I only use one sync for all the
tables as with the latest improvements it is fast enough.

I checked the code and I think I found the scenario where records could be
missed to be replicated.It is caused by the assumption that txntime is
unique or all the records with the same txntime are processed during the
same sync.

Let's say I have 2 dbs:
db1 = MASTER, all the clients connect to that db
db2 = STANDBY

1 *First transaction* *begins* and inserts a record into table1 with
table_id = 1.
This generates an entry in delta_table1 with (table_id, txntime) = (1,
'2015-11-05 13:45:55.657-04')

2 *Second transaction* *begins* in the same time and inserts a record into
table1 with table_id = 2.
This generates an entry in delta_table1 with (table_id, txntime) = (2,
'2015-11-05 13:45:55.657-04')

Because the transactions started in the same time, now() returns the same
value for both so txntime will be the same for the 2 records in the delta
table

3 Then the* first transaction commits*

4* First sync starts*

5

*Second transaction commits*
6 *First sync finishes *and it replicated only the record with table_id = 1
because the second transaction was not committed when the sync started.

The sync process also inserts all the txntime records that were processed
into the track table:

INSERT INTO bucardo.track_table1 (txntime,target)
SELECT DISTINCT txntime, 'dbgroup sync_all1'::text
FROM bucardo.delta_table1 d
WHERE NOT EXISTS (
SELECT 1 FROM bucardo.track_table1 t WHERE d.txntime = t.txntime AND
(t.target = 'dbgroup sync_all1'::text)
);

*7 The second sync starts* and it should sync the second record with
table_id = 2 but it does not pick it up because it only takes the records
with txntime that were not prcessed (not in track table):

SELECT  DISTINCT "table_id"
FROM bucardo.delta_table1 d
WHERE   NOT EXISTS (
SELECT 1 FROM bucardo.track_table1 t WHERE  d.txntime = t.txntime AND
(t.target = 'dbgroup sync_all1'::text)
);

It takes me about 30 mins to reproduce and it only happens because of the
volume of inserts I have
*.*
I will change the default value for txntime with something unique to
perform some tests and let you know if it resolves my problem.


>
> --
> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 0x14964AC8
>


Ioana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20151105/d8571958/attachment.html>


More information about the Bucardo-general mailing list