[Bucardo-general] support for loss of connectivity to slave server/slave db going down

Wed Feb 3 16:06:40 UTC 2010

On Wed, Jan 20, 2010 at 10:50:06AM -0800, Omar Mehmood wrote:
> In the test scenario, when I restart the slave server and the bucardo 
> processes on each master, approximately 300k un-replicated rows 
> (approximately 150k from each server) are correctly replicated over, 
> but the count doesn't match up exactly-- there are a small number of 
> rows (2395 in this example) that didn't get replicated.

This is very hard to debug from here. Can you simplify it to a test 
case. Is there anything about those rows that looks different? Do their 
times (bucardo_delta.txntime) correspond to anything?

> > Bucardo is setup to restart itself by default, so if it's not, that's a
> > bug we need to address.
> 
> In the test scenario, if I shutdown the slave server DBMS, the 
> Bucardo processes die (the Bucardo setups are located on the master servers).
...
> [Wed Jan 20 18:28:21 2010]  MCP Warning: Killed (line 890): Ping failed for remote database server
> [Wed Jan 20 18:28:21 2010]  MCP Database problem, will respawn after a short sleep: 15
...
> [Wed Jan 20 18:28:36 2010]  MCP Respawn attempt: /usr/local/bin/bucardo_ctl start "Attempting automatic respawn after MCP death"

This looks normal. Does Bucardo fail to restart after that line?

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
Url : https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20100203/41aa4455/attachment.bin