[Bucardo-general] Kid is not responding,
david at endpoint.com
Wed Oct 5 15:13:21 UTC 2016
> Hi all,
> I have a 4 Master bucardo 5.4.1 setup.
> The replication was down for a few days and now I have almost 8 millions rows to be moved between servers.
> Due to that the operation takes more than 1 hour. Until now I had a firewall problem and at almost 1, 1.5 hours the connections was cut and the transaction was restarted.
So did you fix the timeout issue via adjusting the tcp_keep_alives in your postgresql.conf file? I’ve had to do that before with some long-running slony operations where there were long periods of time where no data was being transferred over the connections. That should keep the connection going even if there were high waits in the transfer. (Though I’d be a little surprised if there were pauses of that length without *any* data transfer.)
> Now I have fised that but I got this error:
> (2498) [Wed Oct 5 12:35:20 2016] CTL Warning: Kid 2525 is not responding, will respawn
> (2498) [Wed Oct 5 12:35:20 2016] CTL Old syncrun entry removed during resurrection, start time was 2016-10-05 11:12:45.165723+03
> (6411) [Wed Oct 5 12:35:20 2016] KID (ccAclSync) New kid, sync "ccAclSync" alive=1 Parent=2498 PID=6411 kicked=1
> (6411) [Wed Oct 5 12:35:20 2016] KID (ccAclSync) Overwriting /var/run/bucardo/bucardo.kid.sync.ccAclSync.pid: old process was ?
The messages you point out appear to be more informational than indicative of ongoing error issues; this is the message you get if the Kid process no longer exists. Now, if you are getting this message repeatedly and it’s never able to have the Kid process run that’s a different story. That would indicate that the Kid process is dying while trying to do the actual replication. My guess right now is that it is a residue of the earlier issue you had.
> Is there any way that I could increase kid/sync timeout ? Maybe kick the sync manually with the timeout parameter ?
BTW, there is no timeout setting in Bucardo for the Kid sync. The answer here is to figure out why the Kid is dying if it’s other than the timeout issue, and fix that.
End Point Corporation
david at endpoint.com
More information about the Bucardo-general