[Bucardo-general] Eek - can't figure out how to fix this one....

Sun Aug 1 19:57:32 UTC 2010

Michelle Sullivan wrote:
> Karl Denninger wrote:
>   
>> Ah, it appears that when the installation script runs, it creates the
>> user and sets a non-default search path of it's own.
>>
>> Hmmm... that sounds like a bug, as it's definitely undocumented - and
>> furthermore, if the remote and local hosts are different (and thus you
>> created the "bucardo" user on the remote using the usual "createuser"
>> command, you suddenly have a problem just like this.
>>
>> Will investigate; it does appear that there may be fruitful results
>> found here....
>>     
>
>
> And me I used pgsql on the masters and slave so I would not run into
> this problem - which would explain why I haven't seen it ;-)
>
> Michelle
>   
That appears to have been the problem - sticking a specific "set
search_path" in the role for the bucardo account on the slave fixed it.

Incidentally, my interest in this is due to SLONY blowing chunks on me
after a few years of successful use - it apparently LOST a handful of
syncs (!!), resulting in an out-of-sync database table - a large delete
then failed quite a long time later and hosed me, as there was no way to
get that table back in sync - Slony cannot be told to re-copy only ONE
table out of a set, and the (valid) DELETEs on the master could never
complete on the slave.  Permanent toilet-stoppage there.

Unfortunately, an attempted resync of the entire database (which SLONY
doesn't "officially support", but isn't hard to force with a forced drop
of the schema on the slave, then a re-insertion of the subscriptions)
failed too - on a ~30GB table!  Repeatedly, and it appears that the
problem is somewhere inside the SLON process with memory management,
although I can't prove that yet.  As a result on this particular
database I am now without a working replication system, and that kinda
sucks (to put it mildly.)  Slony had some sort of problem with memory
management with 2.0.3 - I was on 2.0.2 with Postgres 8.4.4 for a good
long while, but 2.0.4 refuses to resync the suspect table, as did 2.0.2,
so I'm good and solidly hosed right now.

As such I seek a working tool to replace SLONY with.  Postgresql 9
appears to have internal replication but if I'm reading the docs right
there is no "initial sync" capability included in it, and it's "all or
nothing."  Meh.  That's unfortunate too as that would work quite well if
that initial-sync problem didn't exist, but it does.  I can live with
the "all or nothing" by running a separate instance of the server if I
must in a given case, but I can't live with having to manually force
both servers into sync as that means I have to take the application
offline long enough to do that, and if something goes wrong that process
has to be repeated - and the databases on which I use this are not
small.  I can lose the high-availability capability for a while until
the sync completes, but not all access.

It also appears that Bucardo does not lock the slave tables with a
trigger to prevent modifications on the slave nodes.  SLONY handles this
internally - you can query against a slave but are prevented from
writing to it.  That stops erroneous code from screwing you by modifying
the table out from under when it's a slave node, which is rather
important - particularly if one of the key fields gets updated, which
could cause an UPDATE or DELETE to fail down the road.  Bucardo appears
to lack this protection.  This could be particularly problematic in the
case where a master goes down, your software fails over and then the
master comes back up - if you fail to "shoot the master in the head" in
that instance it could make a hell of a mess out of the node.  It gets
worse if there's a referential integrity problem created in the interim
on the slave.  IMHO Bucardo ought to protect mods on the slave systems
(and provide a relatively-simple means to drop that protection for
failover purposes.)  SLONY handles this by allowing one to "promote" a
slave to the master, flipping the triggers around and making it possible
to resync the other way without too much drama when the faulted machine
comes back up.

If I can figure out how to solve those apparent problems Bucardo looks
VERY interesting.  I particularly like that I can force a resync of a
single table in the system if for some reason it becomes necessary.  In
the instance that killed me with SLONY, this would have prevented the
system from going out of sync on me as I could have recovered the single
table that appears to have caused my problem, and all would have been
well with the world.

-- Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20100801/91bd95d5/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: karl.vcf
Type: text/x-vcard
Size: 124 bytes
Desc: not available
Url : https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20100801/91bd95d5/attachment.vcf