[Bucardo-general] Bucardo stops all syncs if one node is offline

Rainer Brestan rainer.brestan at gmx.net
Mon Sep 26 11:10:11 UTC 2011


Hi Gustavo,
thats exactly what i asked with my postings on 1st and 3rd of August.

MCP try every mcp_pingtime to reach all source and all target DBs.
This is in Bucardo.pm function mcp_main (for version 4.4.3 line 896).
If it does not find all remote databases, it die (line 902).
If you shutdown one slave, you will find exactly the log message from line 902.
If MCP is starting up again (after the mcp_dbproblem_sleep time), it try to connect to all source and target DB, function connect_database. In there, you find the DBI->connect with RaiseError=>1, so if it cant connect, it dies.
Therefore MCP is restarted endless until all databases are online again.
bucardo_ctl deactivate sync does also not work any more, because MCP is not responding to the NOTIFY event any more (it dies right after creation and cant catch the NOTIFY).

What I did is two hacks to the code (bad hacks, but i got no better solution from the community).
When MCP starts and finds a database not online, it does not die, instead it removes it from its internal list of databases, so it wont check any more.
Second is in the code for mcp_pingtime checking. It now dies only when the bucardo database is not available, but not for any master DB and/or slave DB.

I would really appreciate if anyone has a better solution, like a sleeping sync, where it is tried in background for a sync to become available again.

Greg answered my currently not implemented. The only option is to manually (or with some sort of watchdog process) change the status of the sync or database to 'invactive'.

Rainer

-------- Original-Nachricht --------
> Datum: Wed, 21 Sep 2011 16:38:12 -0300
> Von: Gustavo Tonini <gustavotonini at gmail.com>
> An: Greg Sabino Mullane <greg at endpoint.com>
> CC: bucardo-general at bucardo.org
> Betreff: Re: [Bucardo-general] Bucardo stops all syncs if one node is offline

> I want all syncs related to the offline database stay offline. Other syncs
> must continue working.
> 
> Today, a database problem causes entire cluster replication
> unavailability,
> even if there's no connection problem between most of the sites.
> 
> 
> On Wed, Sep 21, 2011 at 4:23 PM, Greg Sabino Mullane
> <greg at endpoint.com>wrote:
> 
> > On Wed, Sep 21, 2011 at 03:28:12PM -0300, Gustavo Tonini wrote:
> > > Hello everyone,
> > > I have a 4-node replication cluster. There are six bucardo syncs
> > replicating
> > > data between node's.
> > > If one node become offline, Bucardo stops all syncs.
> > > Is there a way to fix this behavior?
> >
> > Mark the offending database as 'inactive' in the bucardo.db table. The
> > problem
> > is that there is no unversally agreed upon behaviour when a node goes
> down.
> > However, it's possible we could code in different solutions and allow
> those
> > to be configured, such that ones Bucardo instance acts as you want it
> to.
> >
> > So in your case, what would you like it to do?
> >
> > --
> > Greg Sabino Mullane greg at endpoint.com
> > End Point Corporation
> > PGP Key: 0x14964AC8
> >
> 
> 
> 
> -- 
> Gustavo.

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de


More information about the Bucardo-general mailing list