[Bucardo-general] Kid is not responding,

Videanu Adrian videanuadrian at yahoo.com
Thu Oct 6 13:30:24 UTC 2016


Hi David,Thanks for your response.After all it turns out that the Bucardo server did not had enough memory. So when the memory was full the kid died.I have upgraded that server from 2 to 12 Gb of ram and it seems that bucardo keeps busy almost 5G.
 Regards, Adrian Videanu

      From: David Christensen <david at endpoint.com>
 To: Videanu Adrian <videanuadrian at yahoo.com> 
Cc: "bucardo-general at bucardo.org" <bucardo-general at bucardo.org>
 Sent: Wednesday, October 5, 2016 6:13 PM
 Subject: Re: [Bucardo-general] Kid is not responding,
   
Hi Videanu,

> Hi all, 
> I have a 4 Master bucardo 5.4.1 setup.
> The replication was down for a few days and now I have almost 8 millions rows to be moved between servers.
> Due to that the operation takes more than 1 hour. Until  now I had a firewall problem and at almost 1, 1.5 hours the connections was cut and the transaction was restarted.

So did you fix the timeout issue via adjusting the tcp_keep_alives in your postgresql.conf file?  I’ve had to do that before with some long-running slony operations where there were long periods of time where no data was being transferred over the connections.  That should keep the connection going even if there were high waits in the transfer.  (Though I’d be a little surprised if there were pauses of that length without *any* data transfer.)

> Now I have fised that but I got this error:
> (2498) [Wed Oct  5 12:35:20 2016] CTL Warning: Kid 2525 is not responding, will respawn
> (2498) [Wed Oct  5 12:35:20 2016] CTL Old syncrun entry removed during resurrection, start time was 2016-10-05 11:12:45.165723+03
> (6411) [Wed Oct  5 12:35:20 2016] KID (ccAclSync) New kid, sync "ccAclSync" alive=1 Parent=2498 PID=6411 kicked=1
> (6411) [Wed Oct  5 12:35:20 2016] KID (ccAclSync) Overwriting /var/run/bucardo/bucardo.kid.sync.ccAclSync.pid: old process was ?

The messages you point out appear to be more informational than indicative of ongoing error issues; this is the message you get if the Kid process no longer exists.  Now, if you are getting this message repeatedly and it’s never able to have the Kid process run that’s a different story.  That would indicate that the Kid process is dying while trying to do the actual replication.  My guess right now is that it is a residue of the earlier issue you had.

> Is there any way that I could increase kid/sync timeout ? Maybe kick the sync manually with the timeout parameter ? 

BTW, there is no timeout setting in Bucardo for the Kid sync.  The answer here is to figure out why the Kid is dying if it’s other than the timeout issue, and fix that.

HTH,

David
--
David Christensen
End Point Corporation
david at endpoint.com
785-727-1171




   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20161006/c6614db0/attachment.html>


More information about the Bucardo-general mailing list