[Bucardo-general] Best practice for failover

Jörn Ott joern.ott at ott-consult.de
Thu Oct 14 10:21:02 UTC 2010


Hello Greg,
hello all,

Am 01.10.2010 18:24, schrieb Greg Sabino Mullane:
> Looks like this may have been partially answered on IRC, but 
> for the archives I'll attempt as well:
> 
>> I am currently investigating a replication scenario for a client. This
>> scenario involves two master servers (m1, m2) and n slaves (currently
>> only one, maybe 2 or 3 later on). The bucardo database is also hosted on m1.
>>
>> At the moment, there is a swap sync between m1 and m2 and a pushdelta
>> between m1 and s1 and another one between m2 and s1. The whole setup is
>> done via a shell script I wrote, so all information is available outside
>> of the bucardo database as well.
> 
> Why are there two pushdelta syncs? If m1 and m2 are swaps, it should be 
> sufficient to have a single pushdelta from m1 or m2.

At the moment, I have a problem with that. When I only have a pushdelta
sync between Master m1 and slave s01 and a swap sync between both master
servers m1 and m2, i have the following problem:
A new line is created on M1 -> replication works
This line gets edited on m2 -> replication works on the swap sync but it
is not replicated to the slave.

Makedelta is turned on:

dbm1 ~ # bucardo_ctl status herd_Test_dbm1_dbm2 --dbhost 172.17.0.1
Days back: 3  User: bucardo  Database: bucardo  Host: 172.17.0.1
======================================================================
Sync name:            herd_Test_dbm1_dbm2
Current state:        idle (PID = 21007)
Type:                 swap
Source herd/database: herd_Test_dbm1 / testdatenbank_dbm1
Target database:      testdatenbank_dbm2
Tables in sync:       200
Last good:            5m 44s (time to run: 0s)
Last good time:       Oct 13, 2010 16:51:52  Target: testdatenbank_dbm2
Ins/Upd/Del:          0 / 1 / 0
Last bad:             unknown
PID file:
/var/run/bucardo/bucardo.ctl.sync.herd_Test_dbm1_dbm2.pid
PID file created:     Wed Oct 13 15:38:56 2010
Status:               active
Limitdbs:             0
Priority:             0
Checktime:            none
Overdue time:         00:00:00
Expired time:         00:00:00
Stayalive:            yes      Kidsalive: yes
Rebuild index:        0        Do_listen: no
Ping:                 yes      Makedelta: inherits/inherits
Onetimecopy:          0
dbm1 ~ # bucardo_ctl status herd_Test_dbm1_dbs01 --dbhost 172.17.0.1
Days back: 3  User: bucardo  Database: bucardo  Host: 172.17.0.1
======================================================================
Sync name:            herd_Test_dbm1_dbs01
Current state:        idle (PID = 21006)
Type:                 pushdelta
Source herd/database: herd_Test_dbm1 / testdatenbank_dbm1
Target database:      testdatenbank_dbs01
Tables in sync:       200
Last good:            6m 5s (time to run: 0s)
Last good time:       Oct 13, 2010 16:51:38  Target: testdatenbank_dbs01
Ins/Upd/Del:          1 / 0 / 1
Last bad:             1h 18m 52s (time to run: 7m 42s)
Last bad time:        Oct 13, 2010 15:38:50  Target: testdatenbank_dbs01
Latest bad reason:    CTL request
PID file:
/var/run/bucardo/bucardo.ctl.sync.herd_Test_dbm1_dbs01.pid
PID file created:     Wed Oct 13 15:38:56 2010
Status:               active
Limitdbs:             0
Priority:             0
Checktime:            none
Overdue time:         00:00:00
Expired time:         00:00:00
Stayalive:            yes      Kidsalive: yes
Rebuild index:        0        Do_listen: no
Ping:                 yes      Makedelta: inherits/inherits
Onetimecopy:          0

When I have 2 pushdelta syncs (one from m1, one from m2) , data is
replicated correctly.

> 
>> If m2 goes down, most scenarios are clear, there is no impact, as the
>> bucardo database is on m1 and bucardo runs on m1 as well. When m2 comes
>> up again, it should catch up when I kick the sync(s).
> 
> Generally correct. Bucardo will complain loudly about m2 not being reachable, 
> but should be happy once it comes back.
> 
>> If m2 goes down permanently, I'll remove all syncs,

Here I stumbled upon another problem during my simulation runs. I can't
delete a sync (pushdelta between m2 or swap between m1 or m2) when the
database host m2 has gone down permanently because this triggers some
function on that host which can't be reached. I always get the following
message when deleting the link:

# bucardo_ctl delete sync herd_Test_dbm1_dbm2 --dbhost dbm1
Could not delete sync "herd_Test_dbm1_dbm2"
DBD::Pg::st execute failed: ERROR:  error from Perl function
"bucardo_delete_sync": DBI
connect('dbname=testdatenbank;host=dbm2;port=5432','bucardo',...)
failed: could not connect to server: No route to host
        Is the server running on host "dbm2" and accepting
        TCP/IP connections on port 5432? at line 90 at
/usr/sbin/bucardo_ctl line 4257.

Being clever, I tried to delete the line in the sync table manually, but
being not clever enough, I don't know how to disable the trigger which
prevents the deletion.

When I take down the slave host, I can delete the syncs pointing to that
slave host.

Sadly, while one of the nodes (either second master or slave) is down,
replication also stops between first master and the hosts which are
still reachable until I delete all syncs pointing to the hosts which are
down.

As I am trying to figure out a high availability scenario, I figured out
a solution for a slave going down by deleting the syncs when the slave
is down. But I can't use this solution if the second master goes down,
as I can't delete the sync.

Thanks for your answers

Mit freundlichen Grüßen
Jörn Ott
------------------------------------------------------------
Ott Consult UG (haftungsbeschränkt)
Hauptstr. 11e
53604 Bad Honnef
Telefon: +49 2224 968368
Telefax: +49 2224 940874
E-Mail: mailto:info at ott-consult.de
WWW: http://www.ott-consult.de/
Amtsgericht Siegburg HRB 10574
Geschäftsführender Gesellschafter: Jörn Ott


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4068 bytes
Desc: S/MIME Cryptographic Signature
Url : https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20101014/27da16f1/attachment.bin 


More information about the Bucardo-general mailing list