[Bucardo-general] Syncs are aborted if one of the involved DBs is offline

Mon Apr 8 15:45:51 UTC 2013

On 4/8/2013 8:27 AM, Greg Sabino Mullane wrote:
>> sync is performed.  All my remote nodes are slave only so it's not too
>> much of an issue that way for me, however I do have a cluster of 4
>> masters (2 east coast, 2 west cost) that should a master fail it should
>> be placed in read-only on startup whilst the DB is synced with
>> outstanding changes.  Greg, would this be possible to look at?
> Well, the case with one or more "targets" being down in a simple
> single-source-many-targets is fixable, and I will do that once I get
> some tuits. The multi-source case is a lot trickier. It wouldn't need
> to necessarily be put into any mode on startup, it could simply run
> when ready. The problem is how to handle conflicts, and how to deal
> with the fact that the "dead" source db is going to have some
> stored up delta rows that may be quite outdated when compared to the
> other databases. I will think this over some more. Feel free to let
> me know what the ideal behavior would be for your particular situation(s),
> so I can keep that in mind.
I guess a pertinent question would be how the databases are arranged. Is 
there a single central database and the ship-based databases are 
arranged in hub-spoke fashion?
If so, we have the exact same problem, which we solved by setting up 
each remote location as a two-master sync with the central db, and each 
remote runs its own bucardo process (no central bucardo process on the 
hub). The tables we want synchronized across the entire network are set 
with makedelta.

So when one of the remotes fires up, so does its own bucardo process and 
it brings itself into sync with the central db, which just happens to be 
updated itself by the other remotes. Currently have this working with 
multiple remotes and it has some trickiness and things we've found by 
trial and error, but I like it. We have a couple of outstanding 
questions that I asked Greg and the mailing list about a few weeks ago, 
but nothing that is actually stopping the process.

Of course some of our config is what makes this work for us, for example 
our primary key scheme ensures we'll never have a primary key conflict 
since the PKs generated in each db are prefixed with a unique db 
identifier, and our records ultimately tie back to individuals at 
geographically separated events. Since people can only be in one place 
at a time, the likelihood of a conflicting update being made on a 
person's record from multiple disconnected databases is very low (and a 
risk we're willing to take).

Chris.