[Bucardo-general] support for loss of connectivity to slave server/slave db going down

Thu Feb 4 22:49:48 UTC 2010

> > In the test scenario, when I restart the slave server and the bucardo
> > processes on each master, approximately 300k un-replicated rows
> > (approximately 150k from each server) are correctly replicated over,
> > but the count doesn't match up exactly-- there are a small number of
> > rows (2395 in this example) that didn't get replicated.
>
> This is very hard to debug from here. Can you simplify it to a test
> case. Is there anything about those rows that looks different? Do their
> times (bucardo_delta.txntime) correspond to anything?

The rows were exactly the same except for the primary key value.  I've since wiped bucardo off the test environment, so I can't tell you anything more about the test runs except to describe the setup.

> > > Bucardo is setup to restart itself by default, so if it's not, that's a
> > > bug we need to address.
> >
> > In the test scenario, if I shutdown the slave server DBMS, the
> > Bucardo processes die (the Bucardo setups are located on the master servers).
> ...
> > [Wed Jan 20 18:28:21 2010]  MCP Warning: Killed (line 890): Ping failed for remote database server
> > [Wed Jan 20 18:28:21 2010]  MCP Database problem, will respawn after a short sleep: 15
> ...
> > [Wed Jan 20 18:28:36 2010]  MCP Respawn attempt: /usr/local/bin/bucardo_ctl start "Attempting automatic respawn after MCP death"
>
> This looks normal. Does Bucardo fail to restart after that line?

Correct, the bucardo processes died.  That was the tail end of the log.

Anyway, I gave up on trying to get it to work.  I ended up writing my own code to perform replication.  I still would like to thank you for all your help.

Omar

--- On Wed, 2/3/10, Greg Sabino Mullane <greg at endpoint.com> wrote:

> From: Greg Sabino Mullane <greg at endpoint.com>
> Subject: Re: [Bucardo-general] support for loss of connectivity to slave server/slave db going down
> To: "Omar Mehmood" <omarmehmood at yahoo.com>
> Cc: bucardo-general at bucardo.org
> Date: Wednesday, February 3, 2010, 11:06 AM
> On Wed, Jan 20, 2010 at 10:50:06AM
> -0800, Omar Mehmood wrote:
> > In the test scenario, when I restart the slave server
> and the bucardo 
> > processes on each master, approximately 300k
> un-replicated rows 
> > (approximately 150k from each server) are correctly
> replicated over, 
> > but the count doesn't match up exactly-- there are a
> small number of 
> > rows (2395 in this example) that didn't get
> replicated.
> 
> This is very hard to debug from here. Can you simplify it
> to a test 
> case. Is there anything about those rows that looks
> different? Do their 
> times (bucardo_delta.txntime) correspond to anything?
> 
> > > Bucardo is setup to restart itself by default, so
> if it's not, that's a
> > > bug we need to address.
> > 
> > In the test scenario, if I shutdown the slave server
> DBMS, the 
> > Bucardo processes die (the Bucardo setups are located
> on the master servers).
> ...
> > [Wed Jan 20 18:28:21 2010]  MCP Warning: Killed
> (line 890): Ping failed for remote database server
> > [Wed Jan 20 18:28:21 2010]  MCP Database problem,
> will respawn after a short sleep: 15
> ...
> > [Wed Jan 20 18:28:36 2010]  MCP Respawn attempt:
> /usr/local/bin/bucardo_ctl start "Attempting automatic
> respawn after MCP death"
> 
> This looks normal. Does Bucardo fail to restart after that
> line?
> 
> -- 
> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 0x14964AC8
>