[Bucardo-general] If a db is unreachable then nothing works!

Rainer Brestan rainer.brestan at gmx.net
Wed Apr 18 19:00:44 UTC 2012

I have asked a similar question on the list one and a half year ago and the reaction was the same.
The suggested solution is not acceptable for unattended running systems.

I have solved this issue for B4 4.4.5, but the method should work for any B4.

Consider following setup.
Sync set S1 has master DB A and slave DB B.
Sync set S2 has master DB A and slave DB C.
Whenever DB C fails, any sync set will finish working, even if all DB for a sync set are online. There is no automatic rety. The data for sync are not lost (go to delta table on DB A), so automatic retry will transfer remaining data as soon as DB C become online again.

Therefore, i have modified Bucardo.pm to support a retry mechanism for each individual sync set.

Basically it does following things.
- The function connect_database does not terminate with unavailable database, it reports it as "offline".
- MCP regularly checks only the main database, but not source and target.
- CTL removes each "offline" database from its list (but only in memory, not in the configuration).
- If CTL has no source database, it terminates.
- If CTL has no target database left, it terminates.
- If KID has either no source or target, it terminates.

MCP detects a CTL dead and try to restart it, so this functionality is the CTL restart and it was already existing in B4.
CTL detects KID dead and restart it, this was already existing in B4.

With this patch there come up some other issues, which has been solved.
- The KID restart is as fast as possible, so when the KID dies again, it will consume 100% of CPU to restart KID. This was solved by using ctl_checkonkids_time as a delay for restarting KIDs.
- CTL termination is not correct, it misses cleanup_controller call, instead it calls die.
- The dead column in the attnum ordering was done on colinfo instead of targetcolinfo.

I have also added a new copy type (field was already present in sync set table) named "insert".
This copy type does not use the COPY statement for data transfer, it uses INSERT. The reason is that some middleware products for PostgreSQL have problems with COPY FROM and COPY TO.


-------- Original-Nachricht --------
> Datum: Tue, 17 Apr 2012 18:08:45 -0400
> Von: Greg Sabino Mullane <greg at endpoint.com>
> An: john Edison <rifedit at yahoo.fr>
> CC: "bucardo-general at bucardo.org" <bucardo-general at bucardo.org>
> Betreff: Re: [Bucardo-general] If a db is unreachable then nothing works!

> > If a db is unreachable then nothing works!
> Yes, that's the general idea - we don't allow partial replication 
> sets to exist. That said, you can change the status of a known 
> inactive database from 'active' to 'inactive'. I think that will 
> work on a bucardo4 setup, but not sure.
> If anyone on the list has ideas about how to solve this problem 
> in general, let us know. :)
> -- 
> Greg Sabino Mullane greg at endpoint.com
> End Point Corporation
> PGP Key: 0x14964AC8

Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

More information about the Bucardo-general mailing list