[Bucardo-general] kid doesn't start after the serialization failure

Alexey Klyukin alexk at commandprompt.com
Thu Nov 15 20:06:54 UTC 2012


Hi David,

Thank you for your answer.

On Nov 15, 2012, at 7:03 PM, David E. Wheeler <david at justatheory.com> wrote:

> 
> On Nov 15, 2012, at 8:34 AM, Alexey Klyukin <alexk at commandprompt.com> wrote:
> 
>>> I've been testing swap sync and conflict resolution for bucardo 4.8
>> 
>> Sorry, got confused with version numbers, 4.5 it is.
> 
> Bucardo uses SERIALIZED isolation mode when copying data to a target. It has not done a good job of handling serialization failures, however.

At least it communicates clearly that the failure happened, one logical step would be to restart the kid that failed, but for some reason it doesn't do it on its own. The relevant part of the code (in start_controller) is:

                    $self->glog(qq{Cleaning up aborted sync from q table for "$atarget". PID was $apid});
                    ## Recreate this entry, unless it is already there
                    $count = $sth{qcheck}->execute($syncname,$sourcedb,$atarget);
                    $sth{qcheck}->finish();
                    if ($count >= 1) {
                        $self->glog('Already an empty slot, so not re-adding');
                    }
                    else {
                        $self->glog(qq{Re-adding sync to q table for database "$atarget"});
                        $count = $sth{qinsert}->execute($syncname,$$,$sourcedb,$atarget,$synctype);

and the check it  does (qcheck) is:

    ## Checks if there are any matching entries already in the q
    ## We are only responsible for making sure there is one nullable
    $SQL = q{
        SELECT 1
        FROM   bucardo.q
        WHERE  sync=?
        AND    sourcedb=?
        AND    targetdb=?
        AND    started IS NULL
    };
    $sth{qcheck} = $maindbh->prepare($SQL);

I'm curious why is it creating an entry with started = NULL and empty PIDs (qinsert) and then avoids creating new kids if this entry is present...



> The new 5.0 beta (4.95.6) attempts to address this problem by re-trying the copy after a short sleep. 4.5 was supposed to do the same thing, but it seems that it never worked quite right.
> 
> Could you give 4.95.6 a try?

I certainly would in the future, but at the moment I'm hard pressed to work with non-beta version, the latest of which is 4.5 at the moment.

--
Alexey Klyukin        http://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.






More information about the Bucardo-general mailing list