[Bucardo-general] Master to multi slave replication (OleksandrBakuntsev at Eaton.com)

Wed Dec 4 18:30:27 UTC 2013

I've been mulling this as well..

Greg Sabino Mullane wrote:
> On Wed, Dec 04, 2013 at 07:38:22AM -0500, Jonathan Brinkman wrote:
>   
>> I agree this is a huge problem. I have a master with 20 slaves in a dbgroup,
>> using bucardo 4.5, and if any one of the slaves goes offline then ALL syncs
>> break, even those that have nothing to do with the missing slave or that
>> dbgroup. This seriously undermines the value of bucardo... and it is very
>> uncomfortable explaining to our clients why all the syncs are stopped until
>> someone goes to fix the offline slave server.
>>
>> Is this fixed in bucardo 5?
>>     
>
> No, it is not fixed in B5. It's on the radar, but limited resources means I 
> will get to it as some point, but I don't know when that will be. Here's a 
> quick solution overview to kick around:
>
> * We already handle the cases where the ctl and kid lose connection to a 
> server - they just respawn
>
> * If the MCP loses connection to a server, the action depends on a few things:
>
> - If that server is used as a source, we deactivate all syncs using it as a 
> source.
> - If the server is only used as a target, we drive on (if a new flag is set), 
> assuming there are other targets. If it is the only target, we deactivate the 
> sync.
> - Periodically we try to reconnect to the downed servers. Once up, we reactivate 
> the sync and/or add the server back as a target.
>   

Would it be possible to write the a delta table for each target
host+table (or similar to the current but with an additional column of
the target DB in the sync and therefore duplicating rows for each target
- or a target+host table where there is a new table for each target -
the latter of which would be useful for separating out master-master syncs.)

Then instead of having kids and controllers for each target, the target
could run it's own 'target daemon' that connects to the master and
cleans up its own rows after it's been able to sync...  If the target is
part of a multimaster sync ie master-master-slave in the same sync the
system would have to operate as it does now (so that the target daemon
doesn't beat the sources after conflict resolution for the changed rows.)

That way you could have a multi-master cluster operating as it is now
and an additional sync that has one or more of the masters acting as
master to multiple slaves - each independent so if the host dies it's
sync also dies and the rows are left for either cleaning up later or
synced when the host returns.

Thoughts for most of the locking/deltas and also solving the
master-master-slave issue is have all slaves with their own tables and
multi-masters sync provide the deltas after each sync is complete... (ie
by the sync process (master kid)) creating the deltas for the target
instead of the triggers (so that conflict resolution and the final rows
are replicated to all DBs before the target(slave) reads deltas and row
data to replicate to itself.  You could also probably include a flag on
the DB table to say 'let the MCP take care of this sync' or 'let the
remote DB pull the data' vacuuming of the delta table would be done by
which ever process is performing the replication.

Thoughts?

Michelle

-- 
Michelle Sullivan
http://www.mhix.org/