[Bucardo-general] Bucardo weird errors on buzy system

sym39 marechal.sylvain2 at gmail.com
Fri Feb 27 14:05:41 UTC 2015


Hello Greg, thank you for your reply.
> [...]
> If the tables are that busy, you may want to set autokick=0 and rely
> one some other means to start the sync, such as setting it to run
> every 10 seconds. This will reduce the overhead of having a trigger
> on each table.
>
>> 1) How scalable is bucardo : In other words, is there a sync limit
>> in bucardo, that could make a solution with lot of syncs not
>> scalable? For example, are syncs independent or not / dependent of a
>> process that controls all syncs, ie, if one process is blocked, does
>> it have impacts on others?
> They are *mostly* independent. As long as there are no tables shared among
> the syncs, things should run fairly smoothly. There is no sync limit
> per se, but eventually you may see problems with the sheer number of
> notices and perhaps KID processes.

OK.
My bad, the script I wrote to add new events was wrong:
When adding tables, I forgot to include the schema. If things were 
working correctly with only one
event, it was worse and worse when adding new events because instead of 
adding
5 tables in one relgroup, 5 *(number of events) tables were added, 
leading to very bad things.
So correcting my script to only add the tables I really liked to add 
(...) makes things better, no more scaling problem with multiple syncs.

>> Last question, may be related or not : I notice that some sync
>> sometimes become inactive. After that, I find no way to make then
>> work again, using bucardo activate XXX does not solve, and stopping
>> / restarting the daemon does not help, and nothing special is
>> present in the logs to explain what is wrong. So why a sync can
>> become inactive, why and what to do in this case?
> If a database involved in the sync is inactive, the sync will go
> inactive as well. The logs should be giving some clue as to what
> is going on. Try bumping the log_level up (perhaps to DEBUG) and
> see if that gives you a better message.
>
To reproduce this, I "simply" kill postgres (killall -9 postgres) on the 
same host bucardo
daemon is running on. (I am sure there are other ways to reproduce this, 
but at least, this method works)
Then I restart postgres, restart bucardo, and check that fresh data is 
added in all tables.

After that, some of the syncs are still active, the other are stalled, 
and I find no way to
activate them again.
I notice that the CTL and KID processes are not started for those 
stalled sync:
<<<
# ps -ef|grep Bucard
bucardo  28054     1  3 14:49 ?        00:00:00 Bucardo Master Control 
Program v5.3.1. Active syncs: 
event_sync_inf_01,event_sync_inf_02,event_sync_inf_03,event_sync_inf_04,event_sync_inf_05,event_sync_inf_06,event_sync_inf_07,event_sync_inf_08,event_sync_inf_09,event_sync_inf_10,config_sync
bucardo  28184 28054  0 14:49 ?        00:00:00 Bucardo VAC.
bucardo  28197 28054  0 14:49 ?        00:00:00 Bucardo Controller. Sync 
"event_sync_inf_01" for relgroup "event_relgroup_inf_01" to dbs 
"event_dbgroup_inf_01"
bucardo  28198 28054  0 14:49 ?        00:00:00 Bucardo Controller. Sync 
"event_sync_inf_08" for relgroup "event_relgroup_inf_08" to dbs 
"event_dbgroup_inf_08"
bucardo  28200 28197  2 14:49 ?        00:00:00 Bucardo Kid. Sync 
"event_sync_inf_01"
bucardo  28201 28054  0 14:49 ?        00:00:00 Bucardo Controller. Sync 
"event_sync_inf_10" for relgroup "event_relgroup_inf_10" to dbs 
"event_dbgroup_inf_10"
bucardo  28204 28198  2 14:49 ?        00:00:00 Bucardo Kid. Sync 
"event_sync_inf_08"
bucardo  28206 28054  0 14:49 ?        00:00:00 Bucardo Controller. Sync 
"config_sync" for relgroup "config_relgroup" to dbs "config_dbgroup"
bucardo  28208 28201  2 14:49 ?        00:00:00 Bucardo Kid. Sync 
"event_sync_inf_10"
bucardo  28211 28054  0 14:49 ?        00:00:00 Bucardo Controller. Sync 
"event_sync_inf_09" for relgroup "event_relgroup_inf_09" to dbs 
"event_dbgroup_inf_09"
bucardo  28229 28206  2 14:49 ?        00:00:00 Bucardo Kid. Sync 
"config_sync"
bucardo  28232 28211  2 14:49 ?        00:00:00 Bucardo Kid. Sync 
"event_sync_inf_09"

 >>>

<<<
# bucardo status
PID of Bucardo MCP: 28054
  Name                  State    Last good    Time      Last I/D Last 
bad    Time
=====================+========+============+=========+===========+===========+=========
  event_sync_inf_01 | Good   | 13:51:21   | 2s      | 0/0       | 
13:12:49  | 38m 34s
  event_sync_inf_02 | Bad    | 13:12:46   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_03 | Bad    | 13:12:47   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_04 | Bad    | 13:12:46   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_05 | Bad    | 13:12:46   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_06 | Bad    | 13:12:47   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_07 | Bad    | 13:12:46   | 38m 36s | 5/14      | 
13:12:49  | 38m 34s
  event_sync_inf_08 | Good   | 13:51:21   | 1s      | 0/0       | 
13:12:49  | 38m 34s
  event_sync_inf_09 | Good   | 13:51:21   | 2s      | 0/0       | 
13:12:49  | 38m 34s
  event_sync_inf_10 | Good   | 13:51:21   | 1s      | 5/14      | 
13:12:49  | 38m 34s
  config_sync       | Good   | 13:51:22   | 0s      | 0/0       | 
13:12:48  | 38m 34s
 >>>

<<<
# bucardo status event_sync_inf_02
======================================================================
Last good                : Feb 27, 2015 13:12:46 (time to run: 1s)
Rows deleted/inserted    : 5 / 14
Last bad                 : Feb 27, 2015 13:12:49 (time until fail: 1s)
Sync name                : event_sync_inf_02
Current state            : Bad
Source relgroup/database : event_relgroup_inf_02 / event_db_bucardo_0
Tables in sync           : 8
Status                   : Stalled
Check time               : None
Overdue time             : 00:00:00
Expired time             : 00:00:00
Stayalive/Kidsalive      : Yes / Yes
Rebuild index            : No
Autokick                 : Yes
Onetimecopy              : No
Post-copy analyze        : Yes
Last error:              : Failed : DBD::Pg::db do failed:  at 
/usr/share/perl5/Bucardo.pm line 3401. Line: 5453 Main DB state: ? 
Error: none DB event_db_bucardo_0 state: ? Error: none DB 
event_db_bucardo_1 state: 22000 Error: 7  (KID 16002)
======================================================================

 >>>

Here some logs (only for one stalled sync the event_sync_inf_02)
<<<
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP Checking stalled 
sync event_sync_inf_02  #local1,devel
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_activate_sync_event_sync_inf_02" on "bucardo" (line 7727) 
#local1,devel
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_reload_sync_event_sync_inf_02" on "bucardo" (line 7727) 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Activating sync 
"event_sync_inf_02"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Running 
validate_sync on "event_sync_inf_02"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source sequence "event_inf_02.table4_id_seq" on database 
"event_db_bucardo_0"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table4_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source sequence "event_inf_02.streams_id_seq" on database 
"event_db_bucardo_0"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.streams_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source sequence "event_inf_02.table1_id_seq" on database 
"event_db_bucardo_0"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table1_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source table "event_inf_02.table1" on database "event_db_bucardo_0" 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table1 to "bucardo_latest"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source table "event_inf_02.table4" on database "event_db_bucardo_0" 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table4 to "bucardo_latest"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source table "event_inf_02.streams" on database "event_db_bucardo_0"  
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.streams to "bucardo_latest"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source table "event_inf_02.table3" on database "event_db_bucardo_0" 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table3 to "bucardo_latest"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Inspecting 
source table "event_inf_02.table2" on database "event_db_bucardo_0" 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP   Set conflict 
strategy for event_inf_02.table2 to "bucardo_latest"  #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Listen for 
kick_sync_event_sync_inf_02 on event_db_bucardo_0 (source) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Listen for 
kick_sync_event_sync_inf_02 on event_db_bucardo_1 (source) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Sending NOTIFY 
"activated_sync_event_sync_inf_02" (line 7218) skip_commit=0 #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP UNLISTEN for 
"bucardo_activate_sync_event_sync_inf_02" on "bucardo" (line 7220) 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_deactivate_sync_event_sync_inf_02" on "bucardo" (line 7222)  
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_kick_sync_event_sync_inf_02" on "bucardo" (line 7223) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_pause_sync_event_sync_inf_02" on "bucardo" (line 7224) 
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for 
"bucardo_resume_sync_event_sync_inf_02" on "bucardo" (line 7225) 
#local1,devel
2015-02-27 14:49:39 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 5 times from 18258 (line 980) #local1,devel
2015-02-27 14:49:40 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 from 7130 (line 980)  #local1,devel
2015-02-27 14:49:40 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 4 times from 7131 (and 2 other PIDs) (line 
980)  #local1,devel
2015-02-27 14:49:42 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 3 times from 3668 (and 2 other PIDs) (line 
980)  #local1,devel
2015-02-27 14:49:42 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 2 times from 7128 (line 980) #local1,devel
2015-02-27 14:49:44 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 3 times from 7130 (line 980) #local1,devel
2015-02-27 14:49:44 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 2 times from 7128 (line 980) #local1,devel
2015-02-27 14:49:46 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 from 7128 (line 980)  #local1,devel
2015-02-27 14:49:46 [ll101] Bucardo info [28054]: MCP Got NOTICE 
kick_sync_event_sync_inf_02 4 times from 7130 (and 2 other PIDs) (line 
980)  #local1,devel

 >>>

Is there something I can do to make things working without 
removing/adding the sync corresponding to stalled events in bucardo ?

Thanks and regards,

Sylvain



More information about the Bucardo-general mailing list