[Bucardo-general] Bucardo sync cause could not receive data from client: Connection reset by peer

Thu Jun 15 14:37:38 UTC 2017

Hi Greg and team, thanks for your response.

I was trying out various scenarios. Bucardo works fine. The problem related
with "could not receive client data: Connection reset by peer" on postgres
nodes was an AWS Server constraints. My confuse was the ambiguous messages
of log of postgres and bucardo.

I tested in a local environment with virtual machines with few resources
and spent too many time but finished the syncronization.

Then, I did many tests on AWS with several quantities of registers for
determine the treshold. I was monitoring CPU, RAM, I/O, network amount
other key parameters and I saw the Network bandwith peaks.

We upgrade the server properties and we were able to replicate COPY until
12.5 millions. The COPY spent approximately 1 minute and the
synchronization spent about 3 minutes.

-------------------------------
2017-06-02 02: 48: 32.512171 + 00
COPY 12499999
2017-06-02 02: 51: 31.466631 + 00

(24825) [Fri Jun 2 02:51:46 2017] KID (aa_sync) Delta Account for
db_01.public.personal: 12499999
(24825) [Fri Jun 2 02:54:44 2017] KID (aa_sync) Totals: deletes = 0 inserts
= 12499999 conflicts = 0

For an 18M test We did the process you recommend. If we upgrade the server
properties with more Bandwith, CPU, and RAM maybe I could pass the 18M
test. The weight of 12Millions is over 4GB should wich transmitted over the
network.

I am planning pass to production environment in a few days. When finish I
tell you how works.

Thanks for your time.

2017-06-14 16:31 GMT-04:00 Greg Sabino Mullane <greg at endpoint.com>:

> > (25903) [Fri May 26 22:08:59 2017] KID (aa_sync) Delta count for
> > name_01.public.personas : 18903143
> > (23975) [Fri May 26 22:09:46 2017] CTL Warning: Kid 25903 is not
> > responding, will respawn
>
> My guess is that with a delta collection that large (18M rows!) Bucardo
> is taking a really, really long time to walk through them, and some other
> timeout is getting hit, causing the disconnection.
>
> If that 18M represents almost all of the rows in the table, I would
> try a bulk copy from one side to the other (assuming the table is
> meant to be identical).
>
> There are a number of ways to do so, but the simplest may be to stop
> writes to the table via your application, truncate one side, then
> copy the data from the other[1] (and setting session_replication_role
> to 'replica' so you don't create more deltas). Once that is done, you
> can also truncate the delta tables.
>
> Test first on a dev box of course!
>
> --
> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 2529 DF6A B8F7 9407 E944  45B4 BC9B 9067 1496 4AC8
>

-- 

*Ing**. Alexis Arnal*
*041**6**-6182343*
aarnal at corcaribe.com

*Por favor, antes de imprimir este mensaje, asegúrate de que es necesario.
Ayudemos a cuidar el medio ambiente.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20170615/b6a24d04/attachment.html>