[Bucardo-general] Bucardo weird errors on buzy system
sym39
marechal.sylvain2 at gmail.com
Fri Feb 27 14:05:41 UTC 2015
Hello Greg, thank you for your reply.
> [...]
> If the tables are that busy, you may want to set autokick=0 and rely
> one some other means to start the sync, such as setting it to run
> every 10 seconds. This will reduce the overhead of having a trigger
> on each table.
>
>> 1) How scalable is bucardo : In other words, is there a sync limit
>> in bucardo, that could make a solution with lot of syncs not
>> scalable? For example, are syncs independent or not / dependent of a
>> process that controls all syncs, ie, if one process is blocked, does
>> it have impacts on others?
> They are *mostly* independent. As long as there are no tables shared among
> the syncs, things should run fairly smoothly. There is no sync limit
> per se, but eventually you may see problems with the sheer number of
> notices and perhaps KID processes.
OK.
My bad, the script I wrote to add new events was wrong:
When adding tables, I forgot to include the schema. If things were
working correctly with only one
event, it was worse and worse when adding new events because instead of
adding
5 tables in one relgroup, 5 *(number of events) tables were added,
leading to very bad things.
So correcting my script to only add the tables I really liked to add
(...) makes things better, no more scaling problem with multiple syncs.
>> Last question, may be related or not : I notice that some sync
>> sometimes become inactive. After that, I find no way to make then
>> work again, using bucardo activate XXX does not solve, and stopping
>> / restarting the daemon does not help, and nothing special is
>> present in the logs to explain what is wrong. So why a sync can
>> become inactive, why and what to do in this case?
> If a database involved in the sync is inactive, the sync will go
> inactive as well. The logs should be giving some clue as to what
> is going on. Try bumping the log_level up (perhaps to DEBUG) and
> see if that gives you a better message.
>
To reproduce this, I "simply" kill postgres (killall -9 postgres) on the
same host bucardo
daemon is running on. (I am sure there are other ways to reproduce this,
but at least, this method works)
Then I restart postgres, restart bucardo, and check that fresh data is
added in all tables.
After that, some of the syncs are still active, the other are stalled,
and I find no way to
activate them again.
I notice that the CTL and KID processes are not started for those
stalled sync:
<<<
# ps -ef|grep Bucard
bucardo 28054 1 3 14:49 ? 00:00:00 Bucardo Master Control
Program v5.3.1. Active syncs:
event_sync_inf_01,event_sync_inf_02,event_sync_inf_03,event_sync_inf_04,event_sync_inf_05,event_sync_inf_06,event_sync_inf_07,event_sync_inf_08,event_sync_inf_09,event_sync_inf_10,config_sync
bucardo 28184 28054 0 14:49 ? 00:00:00 Bucardo VAC.
bucardo 28197 28054 0 14:49 ? 00:00:00 Bucardo Controller. Sync
"event_sync_inf_01" for relgroup "event_relgroup_inf_01" to dbs
"event_dbgroup_inf_01"
bucardo 28198 28054 0 14:49 ? 00:00:00 Bucardo Controller. Sync
"event_sync_inf_08" for relgroup "event_relgroup_inf_08" to dbs
"event_dbgroup_inf_08"
bucardo 28200 28197 2 14:49 ? 00:00:00 Bucardo Kid. Sync
"event_sync_inf_01"
bucardo 28201 28054 0 14:49 ? 00:00:00 Bucardo Controller. Sync
"event_sync_inf_10" for relgroup "event_relgroup_inf_10" to dbs
"event_dbgroup_inf_10"
bucardo 28204 28198 2 14:49 ? 00:00:00 Bucardo Kid. Sync
"event_sync_inf_08"
bucardo 28206 28054 0 14:49 ? 00:00:00 Bucardo Controller. Sync
"config_sync" for relgroup "config_relgroup" to dbs "config_dbgroup"
bucardo 28208 28201 2 14:49 ? 00:00:00 Bucardo Kid. Sync
"event_sync_inf_10"
bucardo 28211 28054 0 14:49 ? 00:00:00 Bucardo Controller. Sync
"event_sync_inf_09" for relgroup "event_relgroup_inf_09" to dbs
"event_dbgroup_inf_09"
bucardo 28229 28206 2 14:49 ? 00:00:00 Bucardo Kid. Sync
"config_sync"
bucardo 28232 28211 2 14:49 ? 00:00:00 Bucardo Kid. Sync
"event_sync_inf_09"
>>>
<<<
# bucardo status
PID of Bucardo MCP: 28054
Name State Last good Time Last I/D Last
bad Time
=====================+========+============+=========+===========+===========+=========
event_sync_inf_01 | Good | 13:51:21 | 2s | 0/0 |
13:12:49 | 38m 34s
event_sync_inf_02 | Bad | 13:12:46 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_03 | Bad | 13:12:47 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_04 | Bad | 13:12:46 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_05 | Bad | 13:12:46 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_06 | Bad | 13:12:47 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_07 | Bad | 13:12:46 | 38m 36s | 5/14 |
13:12:49 | 38m 34s
event_sync_inf_08 | Good | 13:51:21 | 1s | 0/0 |
13:12:49 | 38m 34s
event_sync_inf_09 | Good | 13:51:21 | 2s | 0/0 |
13:12:49 | 38m 34s
event_sync_inf_10 | Good | 13:51:21 | 1s | 5/14 |
13:12:49 | 38m 34s
config_sync | Good | 13:51:22 | 0s | 0/0 |
13:12:48 | 38m 34s
>>>
<<<
# bucardo status event_sync_inf_02
======================================================================
Last good : Feb 27, 2015 13:12:46 (time to run: 1s)
Rows deleted/inserted : 5 / 14
Last bad : Feb 27, 2015 13:12:49 (time until fail: 1s)
Sync name : event_sync_inf_02
Current state : Bad
Source relgroup/database : event_relgroup_inf_02 / event_db_bucardo_0
Tables in sync : 8
Status : Stalled
Check time : None
Overdue time : 00:00:00
Expired time : 00:00:00
Stayalive/Kidsalive : Yes / Yes
Rebuild index : No
Autokick : Yes
Onetimecopy : No
Post-copy analyze : Yes
Last error: : Failed : DBD::Pg::db do failed: at
/usr/share/perl5/Bucardo.pm line 3401. Line: 5453 Main DB state: ?
Error: none DB event_db_bucardo_0 state: ? Error: none DB
event_db_bucardo_1 state: 22000 Error: 7 (KID 16002)
======================================================================
>>>
Here some logs (only for one stalled sync the event_sync_inf_02)
<<<
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP Checking stalled
sync event_sync_inf_02 #local1,devel
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_activate_sync_event_sync_inf_02" on "bucardo" (line 7727)
#local1,devel
2015-02-27 14:49:36 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_reload_sync_event_sync_inf_02" on "bucardo" (line 7727)
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Activating sync
"event_sync_inf_02" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Running
validate_sync on "event_sync_inf_02" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source sequence "event_inf_02.table4_id_seq" on database
"event_db_bucardo_0" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table4_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source sequence "event_inf_02.streams_id_seq" on database
"event_db_bucardo_0" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.streams_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source sequence "event_inf_02.table1_id_seq" on database
"event_db_bucardo_0" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table1_id_seq to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source table "event_inf_02.table1" on database "event_db_bucardo_0"
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table1 to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source table "event_inf_02.table4" on database "event_db_bucardo_0"
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table4 to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source table "event_inf_02.streams" on database "event_db_bucardo_0"
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.streams to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source table "event_inf_02.table3" on database "event_db_bucardo_0"
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table3 to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Inspecting
source table "event_inf_02.table2" on database "event_db_bucardo_0"
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Set conflict
strategy for event_inf_02.table2 to "bucardo_latest" #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Listen for
kick_sync_event_sync_inf_02 on event_db_bucardo_0 (source) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Listen for
kick_sync_event_sync_inf_02 on event_db_bucardo_1 (source) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP Sending NOTIFY
"activated_sync_event_sync_inf_02" (line 7218) skip_commit=0 #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP UNLISTEN for
"bucardo_activate_sync_event_sync_inf_02" on "bucardo" (line 7220)
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_deactivate_sync_event_sync_inf_02" on "bucardo" (line 7222)
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_kick_sync_event_sync_inf_02" on "bucardo" (line 7223) #local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_pause_sync_event_sync_inf_02" on "bucardo" (line 7224)
#local1,devel
2015-02-27 14:49:37 [ll101] Bucardo info [28054]: MCP LISTEN for
"bucardo_resume_sync_event_sync_inf_02" on "bucardo" (line 7225)
#local1,devel
2015-02-27 14:49:39 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 5 times from 18258 (line 980) #local1,devel
2015-02-27 14:49:40 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 from 7130 (line 980) #local1,devel
2015-02-27 14:49:40 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 4 times from 7131 (and 2 other PIDs) (line
980) #local1,devel
2015-02-27 14:49:42 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 3 times from 3668 (and 2 other PIDs) (line
980) #local1,devel
2015-02-27 14:49:42 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 2 times from 7128 (line 980) #local1,devel
2015-02-27 14:49:44 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 3 times from 7130 (line 980) #local1,devel
2015-02-27 14:49:44 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 2 times from 7128 (line 980) #local1,devel
2015-02-27 14:49:46 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 from 7128 (line 980) #local1,devel
2015-02-27 14:49:46 [ll101] Bucardo info [28054]: MCP Got NOTICE
kick_sync_event_sync_inf_02 4 times from 7130 (and 2 other PIDs) (line
980) #local1,devel
>>>
Is there something I can do to make things working without
removing/adding the sync corresponding to stalled events in bucardo ?
Thanks and regards,
Sylvain
More information about the Bucardo-general
mailing list