[Bucardo-general] Bucardo sync cause could not receive data from client: Connection reset by peer
Alexis Arnal
aarnal at corcaribe.com
Thu Jun 15 14:37:38 UTC 2017
Hi Greg and team, thanks for your response.
I was trying out various scenarios. Bucardo works fine. The problem related
with "could not receive client data: Connection reset by peer" on postgres
nodes was an AWS Server constraints. My confuse was the ambiguous messages
of log of postgres and bucardo.
I tested in a local environment with virtual machines with few resources
and spent too many time but finished the syncronization.
Then, I did many tests on AWS with several quantities of registers for
determine the treshold. I was monitoring CPU, RAM, I/O, network amount
other key parameters and I saw the Network bandwith peaks.
We upgrade the server properties and we were able to replicate COPY until
12.5 millions. The COPY spent approximately 1 minute and the
synchronization spent about 3 minutes.
-------------------------------
2017-06-02 02: 48: 32.512171 + 00
COPY 12499999
2017-06-02 02: 51: 31.466631 + 00
(24825) [Fri Jun 2 02:51:46 2017] KID (aa_sync) Delta Account for
db_01.public.personal: 12499999
(24825) [Fri Jun 2 02:54:44 2017] KID (aa_sync) Totals: deletes = 0 inserts
= 12499999 conflicts = 0
For an 18M test We did the process you recommend. If we upgrade the server
properties with more Bandwith, CPU, and RAM maybe I could pass the 18M
test. The weight of 12Millions is over 4GB should wich transmitted over the
network.
I am planning pass to production environment in a few days. When finish I
tell you how works.
Thanks for your time.
2017-06-14 16:31 GMT-04:00 Greg Sabino Mullane <greg at endpoint.com>:
> > (25903) [Fri May 26 22:08:59 2017] KID (aa_sync) Delta count for
> > name_01.public.personas : 18903143
> > (23975) [Fri May 26 22:09:46 2017] CTL Warning: Kid 25903 is not
> > responding, will respawn
>
> My guess is that with a delta collection that large (18M rows!) Bucardo
> is taking a really, really long time to walk through them, and some other
> timeout is getting hit, causing the disconnection.
>
> If that 18M represents almost all of the rows in the table, I would
> try a bulk copy from one side to the other (assuming the table is
> meant to be identical).
>
> There are a number of ways to do so, but the simplest may be to stop
> writes to the table via your application, truncate one side, then
> copy the data from the other[1] (and setting session_replication_role
> to 'replica' so you don't create more deltas). Once that is done, you
> can also truncate the delta tables.
>
> Test first on a dev box of course!
>
> --
> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 2529 DF6A B8F7 9407 E944 45B4 BC9B 9067 1496 4AC8
>
--
*Ing**. Alexis Arnal*
*041**6**-6182343*
aarnal at corcaribe.com
*Por favor, antes de imprimir este mensaje, asegúrate de que es necesario.
Ayudemos a cuidar el medio ambiente.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20170615/b6a24d04/attachment.html>
More information about the Bucardo-general
mailing list