[Bucardo-general] Bucardo breaks when it cannot contact the downstream database

Wed Mar 12 18:34:02 UTC 2014

Hello all

We have a bucardo setup in our environment where one bucardo server is
replicating to various downstream databases. When one of the downstream
databases goes down then bucardo on the master gets into weird behavior.
All the kid processes die and MCP tries to bring them up but they die
again. This goes on until the downstream comes back up.  (please see ERROR
1 below)

I have tried to deactivate the sync, before I shutdown the downstream but
it doesnt seem to help. Next i tried to remove the sync before I shut the
downstream and add it back after I bring it back up. This was ok with
bucardo in that the remaining kids were doing ok but when I added the sync
back and reloaded, kicked etc, the sync is broken now. Bucardo says it
cannot kick an inactive sync but the sync shows its active.(please see
ERROR 2 below). Please help.

*ERROR 1*

TERSE (15087) [Wed Mar 12 18:27:23 2014] MCP Connecting to database
"pdx_rptcom01_db" (target)
WARN (15087) [Wed Mar 12 18:27:23 2014] MCP Warning: Killed (line 44): DBI
connect('dbname=reporting;host=pdxqarptcom01.iovationnp.com','bucardo',...)
failed: could not connect to server: Connection refused
Is the server running on host "pdxqarptcom01.iovationnp.com" (10.4.32.124)
and accepting
TCP/IP connections on port 5432? at /usr/share/perl5/vendor_perl/Bucardo.pm
line 5082
TERSE (15087) [Wed Mar 12 18:27:23 2014] MCP Database problem, will respawn
after a short sleep: 15
TERSE (15087) [Wed Mar 12 18:27:25 2014] MCP End of cleanup_mcp. Sys time:
Wed Mar 12 11:27:25 2014. Database time: 2014-03-12 18:27:25.291993+00
TERSE (15087) [Wed Mar 12 18:27:25 2014] MCP Sleep time: 15
TERSE (15087) [Wed Mar 12 18:27:40 2014] MCP Respawn attempt:
/usr/sbin/bucardo  start 'Attempting automatic respawn after MCP death'
WARN (15107) [Wed Mar 12 18:27:40 2014] MCP Starting Bucardo version 4.99.10
WARN (15107) [Wed Mar 12 18:27:40 2014] MCP Log level: terse
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Local epoch:
1394648860.58464  DB epoch: 1394648860.58469
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Local time: Wed Mar
12 11:27:40 2014  DB time: 2014-03-12 18:27:40.584692+00
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Local timezone: PDT
(-0700)  DB timezone: UTC
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Postgres version:
90207
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Database port: 5432
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP PID: 15108
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Postgres backend PID: 15109
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Postgres library version: 80412
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP bucardo: /usr/sbin/bucardo
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Bucardo.pm:
/usr/share/perl5/vendor_perl/Bucardo.pm
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP OS: linux  Perl: /usr/bin/perl
5.10.1
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP DBI version: 1.609  DBD::Pg
version: 2.15.1 (21501) DBIx::Safe version: 1.2.5
TERSE (15108) [Wed Mar 12 18:27:40 2014] MCP Bucardo object:
 batch               => '0'
 bcverbose           => '1'
 created             => 'Wed Mar 12 11:27:40 2014'
 dbdpgversion        => '21501'
 dbhlist             => 'HASH(0x24c2598)'
 dbname              => 'bucardo'
 dbpass              => '<not shown>'
 dbuser              => 'bucardo'
 dryrun              => '0'
 exit_on_nosync      => '0'
 extraname           => ''
 listening           => 'HASH(0x211f868)'
 logclean            => '0'
 logcodes            => [sub { "DUMMY" }]
 logdest             => ['/var/log/bucardo']
 logextension        => ''
 logpid              => '15108'
 logprefix           => 'MCP'
 logseparate         => '0'
 masterdbh           => 'DBI::db=HASH(0x2177f18)'
 mcp_backend         => '15109'
 mcp_clock_timestamp => 'clock_timestamp()'
 mcppid              => '15087'
 pidfile             => '/var/run/bucardo/bucardo.mcp.pid'
 pidmap              => 'HASH(0x24b68e8)'
 sendmail            => '0'
 sendmail_file       => ''
 sqlprefix           => '/* Bucardo 4.99.10 */'
 stopfile            => '/var/run/bucardo/fullstopbucardo'
 verbose             => '1'
 version             => '4.99.10'
 warning_file        => ''
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Bucardo config:
 autosync_ddl              => 'newcol'
 bucardo_current_version   => '4.99.10'
 bucardo_vac               => '1'
 bucardo_version           => '4.99.6'
 ctl_checkonkids_time      => '10'
 ctl_createkid_time        => '0.5'
 ctl_sleep                 => '0.2'
 default_conflict_strategy => 'bucardo_latest'
 default_email_from        => 'nobody at example.com'
 default_email_host        => 'localhost'
 default_email_to          => 'nobody at example.com'
 email_debug_file          => ''
 endsync_sleep             => '1.0'
 flatfile_dir              => '.'
 host_safety_check         => ''
 isolation_level           => 'repeatable read'
 kid_deadlock_sleep        => '0'
 kid_nodeltarows_sleep     => '0.5'
 kid_pingtime              => '60'
 kid_restart_sleep         => '1'
 kid_serial_sleep          => '0'
 kid_sleep                 => '0.5'
 log_conflict_file         => '/var/log/bucardo/bucardo_conflict.log'
 log_level                 => 'terse'
 log_level_number          => '1'
 log_microsecond           => '0'
 log_showlevel             => '1'
 log_showline              => '0'
 log_showpid               => '1'
 log_showsyncname          => '1'
 log_showtime              => '2'
 mcp_dbproblem_sleep       => '15'
 mcp_loop_sleep            => '0.2'
 mcp_pingtime              => '60'
 mcp_vactime               => '60'
 piddir                    => '/var/run/bucardo'
 quick_delta_check         => '1'
 reason_file               => '/var/log/bucardo/bucardo.restart.reason.txt'
 reload_config_timeout     => '30'
 semaphore_table           => 'bucardo_status'
 statement_chunk_size      => '10000'
 stats_script_url          => 'http://www.bucardo.org/'
 stopfile                  => 'fullstopbucardo'
 syslog_facility           => 'log_local1'
 tcp_keepalives_count      => '0'
 tcp_keepalives_idle       => '0'
 tcp_keepalives_interval   => '0'
 vac_run                   => '30'
 vac_sleep                 => '120'
 warning_file              => ''

*ERROR 2*
bucardo kick pdx_configuration_reporting_sync
Kicked sync pdx_configuration_reporting_sync

from the log:
TERSE (15108) [Wed Mar 12 18:31:13 2014] MCP Cannot kick inactive sync
"pdx_configuration_reporting_sync"

>From the bucardo status
bucardo status pdx_configuration_reporting_sync
======================================================================
Sync name                : pdx_configuration_reporting_sync
Current state            : No records found
Source relgroup/database : configuration_reporting_rels /
pdx_configuration_db
Tables in sync           : 16
Status                   : Active
Check time               : None
Overdue time             : 00:00:00
Expired time             : 00:00:00
Stayalive/Kidsalive      : Yes / Yes
Rebuild index            : No
Autokick                 : Yes
Onetimecopy              : No
Post-copy analyze        : Yes
Last error:              :
======================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20140312/6433fe59/attachment-0001.html>