[Bucardo-general] Hung problem..

Michelle Sullivan michelle at sorbs.net
Tue Apr 15 11:16:34 UTC 2014


I am continually seeing my Bucardo hanging up... when it does it looks
like this:

 5487 ?        S      0:00 Bucardo Kid. Sync "rt4db"
 5488 ?        Ss     0:01 postgres: bucardo bucardo [local] notify
interrupt
16485 ?        Ss     0:00 postgres: pgsql bucardo [local] idle
19100 pts/0    S+     0:00 grep -i buc
20179 ?        S      0:02 Bucardo Kid. Sync "dnsmm"
20180 ?        Ss     0:03 postgres: bucardo bucardo [local] idle
25607 ?        S      0:10 Bucardo Kid. Sync "rt4seq"
25609 ?        Ss     0:04 postgres: bucardo bucardo [local] notify
interrupt
25614 ?        S      0:11 Bucardo Kid. Sync "sorbsmmseq"
25615 ?        Ss     0:11 postgres: bucardo bucardo [local] notify
interrupt
25617 ?        S      0:01 Bucardo Kid. Sync "rt3db"
25618 ?        Ss     0:09 postgres: bucardo bucardo [local] notify
interrupt
25621 ?        S      0:12 Bucardo Kid. Sync "rt3seq"
25622 ?        Ss     0:06 postgres: bucardo bucardo [local] notify
interrupt
25629 ?        S      0:06 Bucardo Kid. Sync "rt4db"
25630 ?        Ss     0:18 postgres: bucardo bucardo [local] notify
interrupt
30978 ?        S      0:01 Bucardo Kid. Sync "sessions"
30979 ?        Ss     0:09 postgres: bucardo bucardo [local] notify
interrupt


Kill -9 on both the DB process and the Bucardo Kid is the only way to
resolve it... any clues as to what is wrong?

Killing only the DB processes (or restarting the DB with -m immediate)
leaves the Kids hanging indefinitely.

My suspicion of the initial cause is that the interconnect between DBs
is going away silently (and returning as it's a VPN between data
centers..) but how to stop bucardo failing to recover?  What's causing
the lockup inside bucardo..?

DB is Pg 8.4.10 on all multi-master nodes on CentOS, bucardo version
4.99.10.

Regards,

Michelle



More information about the Bucardo-general mailing list