[Bucardo-general] Bucardo-general Digest, Vol 62, Issue 10

Jonathan Brinkman jb at blackskytech.com
Fri Nov 9 15:12:53 UTC 2012


Yes it's the remote database that is sometimes unreachable.

Stayalive and kidsalive are true
Sync name:            cmsvog_swap_main_and_gate
Current state:        idle (PID = 21547)
Type:                 swap
Source herd/database: cmsvog_main_and_gate / cmsvog_main
Target database:      cmsvog_gate
Tables in sync:       21
Last good:            36s (time to run: 2s)
Last good time:       Nov 09, 2012 09:39:40  Target: cmsvog_gate
Ins/Upd/Del:          1 / 0 / 0
Last bad:             20m 40s (time to run: 16h 23m 49s)
Last bad time:        Nov 09, 2012 09:19:36  Target: cmsvog_gate
Latest bad reason: MCP removing stale q entry
PID file:
/var/run/bucardo/bucardo.ctl.sync.cmsvog_swap_main_and_gate.pid
PID file created:     Fri Nov  9 09:21:29 2012
Status:               active
Limitdbs:             0
Priority:             0
Checktime:            none
Overdue time:         00:00:00
Expired time:         00:00:00
Stayalive:            yes      Kidsalive: yes
Rebuild index:        0        Do_listen: yes
Ping:                 yes      Makedelta: no
Onetimecopy:          0

------------------------------

Here are the final lines from the log.bucardo before it died. Looks like the
last successful sync was at 16:56:01

[Thu Nov  8 16:56:01 2012]  KID Got a notice for
cmsvog_pushdelta_main_to_gate: cmsvog_main -> cmsvog_gate
[Thu Nov  8 17:11:55 2012]  KID Final database backend PID is 17484
[Thu Nov  8 17:11:55 2012]  KID Kid exiting at cleanup_kid. Reason:
DBD::Pg::db do failed: SSL SYSCALL error: No route to host at
/usr/local/share/perl/5.10.1/Bucardo.pm line 4314.
 main error: none source error: none target error: 7 States://22000
[Thu Nov  8 17:11:55 2012]  KID Removed pid file
"/var/run/bucardo/bucardo.kid.sync.cmsvog_pushdelta_main_to_gate.cmsvog_gate
.pid"
[Thu Nov  8 17:12:01 2012]  CTL Rows updated child 2209 to aborted in q: 1
[Thu Nov  8 17:12:01 2012]  CTL Warning! Kid 2209 seems to have died. Sync
"cmsvog_pushdelta_main_to_gate"
[Thu Nov  8 17:12:11 2012]  CTL Cleaning up aborted sync from q table for
"cmsvog_gate". PID was 2209
[Thu Nov  8 17:12:11 2012]  CTL Re-adding sync to q table for database
"cmsvog_gate"
[Thu Nov  8 17:12:12 2012]  CTL Creating kid to handle resurrected q row
[Thu Nov  8 17:12:12 2012]  CTL Created new kid 17496 for sync
"cmsvog_pushdelta_main_to_gate" to database "cmsvog_gate"
[Thu Nov  8 17:12:12 2012]  KID New kid, syncs "cmsvog_main" to
"cmsvog_gate" for sync "cmsvog_pushdelta_main_to_gate" alive=1 Parent=2203
Type=pushdelta
[Thu Nov  8 17:12:12 2012]  KID PID: 17496
[Thu Nov  8 17:12:12 2012]  KID Bucardo database backend PID is 17497
[Thu Nov  8 17:12:12 2012]  KID Source database backend PID is 17498
[Thu Nov  8 17:12:13 2012]  KID Final database backend PID is 17499
[Thu Nov  8 17:12:13 2012]  KID Kid exiting at cleanup_kid. Reason: DBI
connect('dbname=vog_cms_gate;port=5432;host=10.0.1.38','bucardo',...)
failed: could not connect to server: No route to host
        Is the server running on host "10.0.1.38" and accepting
        TCP/IP connections on port 5432? at
/usr/local/share/perl/5.10.1/Bucardo.pm line 267
[Thu Nov  8 17:12:13 2012]  KID Removed pid file
"/var/run/bucardo/bucardo.kid.sync.cmsvog_pushdelta_main_to_gate.cmsvog_gate
.pid"
[Thu Nov  8 17:12:22 2012]  CTL Warning! Kid 17496 seems to have died. Sync
"cmsvog_pushdelta_main_to_gate"
[Thu Nov  8 17:16:16 2012]  KID Final database backend PID is 17638
[Thu Nov  8 17:16:16 2012]  KID Kid exiting at cleanup_kid. Reason: Ping
failed for source database cmsvog_replication
[Thu Nov  8 17:16:16 2012]  KID Removed pid file
"/var/run/bucardo/bucardo.kid.sync.vog_pushdelta_cms_replication.cmsvog_main
.pid"
[Thu Nov  8 17:16:21 2012]  CTL Warning! Kid 15697 seems to have died. Sync
"vog_pushdelta_cms_replication"
[

-----Original Message-----
From: Greg Sabino Mullane [mailto:greg at endpoint.com] 
Sent: Friday, November 09, 2012 9:30 AM
To: Jonathan Brinkman
Cc: bucardo-general at bucardo.org
Subject: Re: [Bucardo-general] Bucardo-general Digest, Vol 62, Issue 10

On Fri, Nov 09, 2012 at 09:17:57AM -0500, Jonathan Brinkman wrote:
> There is no question Bucardo is stopping and not restarting without 
> manual intervention. I'm thinking about creating a bash script that 
> restarts bucardo if the latest log record in log.bucardo is dated 
> longer than, say,
> 10 minutes ago. I'd hoped not to have to do that.

Yeah, that's ugly.

> 	[Thu Nov  8 17:16:16 2012]  KID Removed pid file 
> "/var/run/bucardo/bucardo.kid.sync.vog_pushdelta_cms_replication.cmsvo
> g_main
> .pid"
> 	[Thu Nov  8 17:16:21 2012]  CTL Warning! Kid 15697 seems to have 
> died. Sync "vog_pushdelta_cms_replication"

Is there more than that? I would expect the MCP to log stuff as it leaves as
well. Does this sync have stayalive and kidsalive as true? 
To be clear, it's the remote database(s) that are sometimes unreachable, not
the main Bucardo database, right?

--
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8



More information about the Bucardo-general mailing list