[Bucardo-general] kid doesn't start after the serialization failure

Alexey Klyukin alexk at commandprompt.com
Thu Nov 15 16:34:28 UTC 2012



On Nov 15, 2012, at 5:34 PM, Alexey Klyukin <alexk at commandprompt.com> wrote:

> 
> 
> Hi,
> 
> I've been testing swap sync and conflict resolution for bucardo 4.8

Sorry, got confused with version numbers, 4.5 it is.

> and found that kids die with the following error message quite often during almost concurrent updates (i.e. by manually updating the same role on both source and target in order to simulate a conflict) 
> 
> [Thu Nov 15 15:26:05 2012]  KID No conflict, target only for public.products.prod_id: 10006
> [Thu Nov 15 15:26:05 2012]  KID Action summary: 2:1
> [Thu Nov 15 15:26:05 2012]  KID [1/1] public.products UPDATE target to source pk 10006
> 'Warning! Aborting due to exception for public.products.prod_id: 10006 Error was DBD::Pg::st execute failed: ERROR:  could not serialize access due to concurrent update at /usr/local/share/perl/5.10.1/Bucardo.pm line 5776.'
> [Thu Nov 15 15:26:05 2012]  KID Final database backend PID is 27203
> [Thu Nov 15 15:26:05 2012]  KID Kid exiting at cleanup_kid. Reason: Died at /usr/local/share/perl/5.10.1/Bucardo.pm line 5835.
> [Thu Nov 15 15:26:05 2012]  KID Removed pid file "/var/run/bucardo/bucardo.kid.sync.dellstore2_swap.zen_dellstore2.pid"
> [Thu Nov 15 15:26:14 2012]  CTL Rows updated child 27199 to aborted in q: 1
> [Thu Nov 15 15:26:14 2012]  CTL Warning! Kid 27199 seems to have died. Sync "dellstore2_swap"
> [Thu Nov 15 15:26:24 2012]  CTL Cleaning up aborted sync from q table for "zen_dellstore2". PID was 27199
> [Thu Nov 15 15:26:24 2012]  CTL Already an empty slot, so not re-adding
> 
> 
> After the sync is kicked, bucardo finds delta rows, detects a conflict due to updates for the same rows and successfully resolves it:
> 
> Thu Nov 15 15:31:21 2012]  KID Total delta count: 2
> [Thu Nov 15 15:31:21 2012]  KID Logged details of conflict to bucardo_conflict.log
> [Thu Nov 15 15:31:21 2012]  KID Conflict detected for public.products:10006. Using standard conflict "target"
> [Thu Nov 15 15:31:21 2012]  KID Action summary: 2:1
> [Thu Nov 15 15:31:21 2012]  KID [1/1] public.products UPDATE target to source pk 10006
> [Thu Nov 15 15:31:21 2012]  KID Updating bucardo_track for public.products on blade_dellstore2
> [Thu Nov 15 15:31:21 2012]  KID Updating bucardo_track for public.products on zen_dellstore2
> [Thu Nov 15 15:31:21 2012]  KID Issuing final commit for source and target
> 
> The problem is that the kid is not restarted automatically. I'm not sure if it has something to do with the 'already an empty slot...' error message above. One workaround I found is to set sync's checktime to a non-zero value, so that pending delta rows are detected and replicated, but I wonder if it should restart the kid automatically after such failure, given keepalive flag is set for the sync?
> 
> Thank you,
> --
> Alexey Klyukin        http://www.commandprompt.com
> The PostgreSQL Company – Command Prompt, Inc.
> 
> 
> 
> 
> _______________________________________________
> Bucardo-general mailing list
> Bucardo-general at bucardo.org
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
> 

--
Alexey Klyukin        http://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.






More information about the Bucardo-general mailing list