[Bucardo-general] Another bug?

Michelle Sullivan michelle at sorbs.net
Tue Oct 22 14:00:17 UTC 2013


Michelle Sullivan wrote:
> (31523) [Tue Oct 15 09:41:25 2013] KID Totals: deletes=69 inserts=339
> conflicts=1
> (31523) [Tue Oct 15 09:41:26 2013] KID Expected one row from
> end_syncrun, but got 4
> (31523) [Tue Oct 15 09:41:26 2013] KID Unable to correctly update
> syncrun table! (count was 4)
> (31523) [Tue Oct 15 09:41:30 2013] KID Expected one row from
> end_syncrun, but got 4
> (31523) [Tue Oct 15 09:41:35 2013] KID Expected one row from
> end_syncrun, but got 4
> (31523) [Tue Oct 15 09:41:41 2013] KID Expected one row from
> end_syncrun, but got 4
> (31523) [Tue Oct 15 09:41:45 2013] KID Delta count for
> sorbs_corkscrew.public.audit           : 1
> (31523) [Tue Oct 15 09:41:46 2013] KID Totals: deletes=3 inserts=3
> conflicts=0
> (31523) [Tue Oct 15 09:41:47 2013] KID Expected one row from
> end_syncrun, but got 4
> (31523) [Tue Oct 15 09:41:47 2013] KID Unable to correctly update
> syncrun table! (count was 4)
> (31523) [Tue Oct 15 09:41:51 2013] KID Expected one row from
> end_syncrun, but got 4
>
>   

This might just fix it... in addition the lower part of the patch stops
the kid exiting if pg_cancel fails (usually a serialization error)-
which causes majority of orphaned entries on my systems:

--- Bucardo.pm.orig     2013-10-14 10:44:09.000000000 +0000
+++ Bucardo.pm  2013-10-22 13:58:57.000000000 +0000
@@ -1900,6 +1900,18 @@
             ## At this point, the PID file does not exist or the kid is
not responding
             if ($resurrect) {
                 ## XXX Try harder to kill it?
+
+                ## First clear out any old entries in the syncrun table
+                $sth = $sth{ctl_syncrun_end_now};
+                $count = $sth->execute("Old entry died (CTL $$)",
$syncname);
+                if (1 == $count) {
+                    $info = $sth->fetchall_arrayref()->[0][0];
+                    $self->glog("Old syncrun entry removed during
resurrection, start time was $info", LOG_NORMAL);
+                }
+                else {
+                    $sth->finish();
+                }
+
                 $self->glog("Resurrecting kid $syncname, resurrect was
$resurrect", LOG_DEBUG);
                 $self->{kidpid} = $self->create_newkid($sync);
 
@@ -4823,8 +4835,10 @@
         ## Roll everyone back
         for my $dbname (@dbs_dbi) {
             my $dbh = $sync->{db}{$dbname}{dbh};
-            $dbh->pg_cancel if $dbh->{pg_async_status} > 0;
-            $dbh->rollback;
+            ## Wrapped in an eval as a failure to serialise can cause
an abort() and the KID will die.
+            eval { $dbh->pg_cancel if $dbh->{pg_async_status} > 0; };
+            ## Seperate eval{} for the rollback as we are probably
still connected to the transaction.
+            eval { $dbh->rollback; };
         }
 
         # End the syncrun.






-- 
Michelle Sullivan
http://www.mhix.org/



More information about the Bucardo-general mailing list