[Bucardo-general] Eek - can't figure out how to fix this one....

Sun Aug 1 20:14:03 UTC 2010

Karl Denninger wrote:
> Michelle Sullivan wrote:
>> Karl Denninger wrote:
>>   
>>> Ah, it appears that when the installation script runs, it creates the
>>> user and sets a non-default search path of it's own.
>>>
>>> Hmmm... that sounds like a bug, as it's definitely undocumented - and
>>> furthermore, if the remote and local hosts are different (and thus you
>>> created the "bucardo" user on the remote using the usual "createuser"
>>> command, you suddenly have a problem just like this.
>>>
>>> Will investigate; it does appear that there may be fruitful results
>>> found here....
>>>     
>>
>>
>> And me I used pgsql on the masters and slave so I would not run into
>> this problem - which would explain why I haven't seen it ;-)
>>
>> Michelle
>>   
> That appears to have been the problem - sticking a specific "set
> search_path" in the role for the bucardo account on the slave fixed it.
>
> Incidentally, my interest in this is due to SLONY blowing chunks on me
> after a few years of successful use - it apparently LOST a handful of
> syncs (!!), resulting in an out-of-sync database table - a large
> delete then failed quite a long time later and hosed me, as there was
> no way to get that table back in sync - Slony cannot be told to
> re-copy only ONE table out of a set, and the (valid) DELETEs on the
> master could never complete on the slave.  Permanent toilet-stoppage
> there.

I couldn't get slony working on anything after my DB exceeded 20G..!

>
> Unfortunately, an attempted resync of the entire database (which SLONY
> doesn't "officially support", but isn't hard to force with a forced
> drop of the schema on the slave, then a re-insertion of the
> subscriptions) failed too - on a ~30GB table!  Repeatedly, and it
> appears that the problem is somewhere inside the SLON process with
> memory management, although I can't prove that yet.  As a result on
> this particular database I am now without a working replication
> system, and that kinda sucks (to put it mildly.)  Slony had some sort
> of problem with memory management with 2.0.3 - I was on 2.0.2 with
> Postgres 8.4.4 for a good long while, but 2.0.4 refuses to resync the
> suspect table, as did 2.0.2, so I'm good and solidly hosed right now.
New install of 2.04 failed to even start for me.

>
> As such I seek a working tool to replace SLONY with.  Postgresql 9
> appears to have internal replication but if I'm reading the docs right
> there is no "initial sync" capability included in it, and it's "all or
> nothing."  Meh.  That's unfortunate too as that would work quite well
> if that initial-sync problem didn't exist, but it does.  I can live
> with the "all or nothing" by running a separate instance of the server
> if I must in a given case, but I can't live with having to manually
> force both servers into sync as that means I have to take the
> application offline long enough to do that, and if something goes
> wrong that process has to be repeated - and the databases on which I
> use this are not small.  I can lose the high-availability capability
> for a while until the sync completes, but not all access.
yeah, I looked at that as well...
>
> It also appears that Bucardo does not lock the slave tables with a
> trigger to prevent modifications on the slave nodes.

That is documented and quite deliberate (i believe)... Bucardo can do
multi (2) master mode - so you wouldn't lock both masters.  However,
master to slave would indeed give rise to problems if one wrote to the
slave that broke a referral integrity check for a subsequent write by
the master.  That said as has been explained to me Bucardo uses
delete/insert for operation so if there is data written to a slave it is
overwritten by master updates should there be a clash.

> SLONY handles this internally - you can query against a slave but are
> prevented from writing to it.  That stops erroneous code from screwing
> you by modifying the table out from under when it's a slave node,
> which is rather important - particularly if one of the key fields gets
> updated, which could cause an UPDATE or DELETE to fail down the road. 
> Bucardo appears to lack this protection.  This could be particularly
> problematic in the case where a master goes down, your software fails
> over and then the master comes back up - if you fail to "shoot the
> master in the head" in that instance it could make a hell of a mess
> out of the node.  It gets worse if there's a referential integrity
> problem created in the interim on the slave.  IMHO Bucardo ought to
> protect mods on the slave systems (and provide a relatively-simple
> means to drop that protection for failover purposes.)  SLONY handles
> this by allowing one to "promote" a slave to the master, flipping the
> triggers around and making it possible to resync the other way without
> too much drama when the faulted machine comes back up.
>
> If I can figure out how to solve those apparent problems Bucardo looks
> VERY interesting.  I particularly like that I can force a resync of a
> single table in the system if for some reason it becomes necessary. 
> In the instance that killed me with SLONY, this would have prevented
> the system from going out of sync on me as I could have recovered the
> single table that appears to have caused my problem, and all would
> have been well with the world.
I have run into a number of issues - however Bucardo's self recovery of
out of sync tables is impressive.  Seems that if you can run multiple
syncs then you can have a very nice system, however if like me you have
58 tables of which 54 are linked to each other with foreign key
constraints (particularly to an audit table) then you have to sync the
DB with one sync and the time that takes is quite large (was 20 minutes
per sync - but I am inserting a number of rows ranging from 1m to 10m
per day against multiple tables in that sync.)

Michelle