[Bucardo-general] Bucardo-general Digest, Vol 78, Issue 7

Thu Mar 13 13:58:18 UTC 2014

Agree that is a problem for us too. 
We ended up creating a cron entry to run a custom checkup script.
What is also nasty is if you have a multi-sync setup, if any one of the
syncs break it kills all of the syncs. Boy do we hate that 

Here is our cron and script:

############################################################################
##############
## BUCARDO CRON SCRIPTS:
# Keep bucardo running, check every 10 min, but not b/t 0:00AM - 3:59AM
*/10 4-23  * * *    root  /home/postgres/BST_Bucardo_keepitup.sh

#bucardo ping
*/2 *   * * *    root  psql -d orca_cms_main -U postgres -c "UPDATE
clientdata.persons SET datemodified = NOW() WHERE personid = 1;" >>
/home/postgres/log_bucardo_ping
############################################################################
##############

Script:
[code]
#!/bin/bash

#==========================================================================
# Settings

# Log file used to document the automatic bucardo restarts
LogFile="/home/postgres/bucardo.keepitup.restarts";

# This temp file is where we will temporarilly store the process list
TempFile="/tmp/bucardo_processes";

# This is the IP of the remote computer that bucardo needs to connect to
RemoteHost="10.0.1.20";

###########################################################################
# ENSURE BUCARDO STAYS RUNNING!
###########################################################################
#
#1. ensure using postgres user
#2. create/copy this file BST_bucardo_keepitup.sh
        # on Cloud-DB:
           # sudo nano /home/postgres/BST_bucardo_keepitup.sh
           # paste text of this doc into file
#3. sudo chmod +x /home/postgres/BST_bucardo_keepitup.sh
        # sudo chmod 771 /home/postgres/BST_bucardo_keepitup.sh
#4. test file: ./BST_bucardo_keepitup.sh
#5. set up cron (every 5 minutes):
        # crontab -e
        # */5 * * * * /home/postgres/BST_bucardo_keepitup.sh
#
###########################################################################
#
#check /home/postgres/log.bucardo
#if timestamp is older than 10 minutes old, run sudo bucardo_ctl restart

# Has the log file NOT been modified in the last 5 minutes?
if [ `find -P -O3 /home/postgres -maxdepth 1 -mmin -5 -name log.bucardo | wc
-l` -eq 0 ] ; then

        # First we will instruct bucardo to stop
        bucardo_ctl stop;

        # Lets wait 3 seconds for it to do its thing
        sleep 3;

        # This command will print out all the process details for only the
existing Bucardo Master Control
        # Programs. We want to save it to a temp file because this listing
is volitile.
        ps -Af | grep -i "Bucardo Master Control Program" | grep -v -i grep
> $TempFile;

        # Save the number of results into Count.
        Count=`cat $TempFile | wc -l`;

        # Loop for each running control program
        for (( c=1; c<=${Count}; c++ ))
        do

                # Extract the current specific line from the process list
                PID=`cat $TempFile | sed -n -e ${c}p`;

                # Convert the string into an array so we could access its
items by index
                IFS=' ' read -a array <<< "$PID";

                # Extract only the PID which is the second string in the
line (index 1)
                PID="${array[1]}";

                # Lets force kill this process
                kill -9 $PID;

                # Lets log all the processes we had to force kill
                echo $(date +"[%m/%d/%Y %H:%M:%S] ") "Could not stop Process
$PID, Killing it!" >> $LogFile

        done;

        # Before starting up Bucardo, lets ensure that we have ping from the
remote computer
        ping -c 1 $RemoteHost &> /dev/null;
        if (( $? )); then

                # Ping failed so lets log this event
                echo $(date +"[%m/%d/%Y %H:%M:%S] ") "Remote computer
offline. Bucardo restart delayed..." >> $LogF$
        else
                # Ping succeeded so we log the fact that we will try to
restart bucardo
                echo $(date +"[%m/%d/%Y %H:%M:%S] ") "Bucardo Restarting!"
>> $LogFile;

                # Now that all the Bucardo Master control programs have
stopped, lets restart bucardo
                bucardo_ctl start;
        fi;

fi;

[/code]

-----Original Message-----
From: bucardo-general-bounces at bucardo.org
[mailto:bucardo-general-bounces at bucardo.org] On Behalf Of
bucardo-general-request at bucardo.org
Sent: Wednesday, March 12, 2014 2:34 PM
To: bucardo-general at bucardo.org
Subject: Bucardo-general Digest, Vol 78, Issue 7

Send Bucardo-general mailing list submissions to
	bucardo-general at bucardo.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://mail.endcrypt.com/mailman/listinfo/bucardo-general
or, via email, send a message with subject or body 'help' to
	bucardo-general-request at bucardo.org

You can reach the person managing the list at
	bucardo-general-owner at bucardo.org

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Bucardo-general digest..."

Today's Topics:

   1. Bucardo breaks when it cannot contact the	downstream database
      (Smitha Pamujula)

----------------------------------------------------------------------

Message: 1
Date: Wed, 12 Mar 2014 11:34:02 -0700
From: Smitha Pamujula <smitha.pamujula at iovation.com>
To: bucardo-general at bucardo.org
Subject: [Bucardo-general] Bucardo breaks when it cannot contact the
	downstream database
Message-ID:
	<CAGWGGXOemT8mb_8hH3riYF4ami3b92fYNwjPxpQ2cL1sd3y3gw at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello all

We have a bucardo setup in our environment where one bucardo server is
replicating to various downstream databases. When one of the downstream
databases goes down then bucardo on the master gets into weird behavior.
All the kid processes die and MCP tries to bring them up but they die again.
This goes on until the downstream comes back up.  (please see ERROR
1 below)

I have tried to deactivate the sync, before I shutdown the downstream but it
doesnt seem to help. Next i tried to remove the sync before I shut the
downstream and add it back after I bring it back up. This was ok with
bucardo in that the remaining kids were doing ok but when I added the sync
back and reloaded, kicked etc, the sync is broken now. Bucardo says it
cannot kick an inactive sync but the sync shows its active.(please see ERROR
2 below). Please help.

*ERROR 1*

TERSE (15087) [Wed Mar 12 18:27:23 2014] MCP Connecting to database
"pdx_rptcom01_db" (target) WARN (15087) [Wed Mar 12 18:27:23 2014] MCP
Warning: Killed (line 44): DBI
connect('dbname=reporting;host=pdxqarptcom01.iovationnp.com','bucardo',...)
failed: could not connect to server: Connection refused Is the server
running on host "pdxqarptcom01.iovationnp.com" (10.4.32.124) and accepting
TCP/IP connections on port 5432? at /usr/share/perl5/vendor_perl/Bucardo.pm
line 5082
TERSE (15087) [Wed Mar 12 18:27:23 2014] MCP Database problem, will respawn
after a short sleep: 15 TERSE (15087) [Wed Mar 12 18:27:25 2014] MCP End of
cleanup_mcp. Sys time:
Wed Mar 12 11:27:25 2014. Database time: 2014-03-12 18:27:25.291993+00 TERSE
(15087) [Wed Mar 12 18:27:25 2014] MCP Sleep time: 15 TERSE (15087) [Wed Mar
12 18:27:40 2014] MCP Respawn attempt:
/usr/sbin/bucardo  start 'Attempting automatic respawn after MCP death'
WARN (15107) [Wed Mar 12 18:27:40 2014] MCP Starting Bucardo version 4.99.10
WARN (15107) [Wed Mar 12 18:27:40 2014] MCP Log level: terse WARN (15108)
[Wed Mar 12 18:27:40 2014] MCP Master DB Local epoch:
1394648860.58464  DB epoch: 1394648860.58469 WARN (15108) [Wed Mar 12
18:27:40 2014] MCP Master DB Local time: Wed Mar
12 11:27:40 2014  DB time: 2014-03-12 18:27:40.584692+00 WARN (15108) [Wed
Mar 12 18:27:40 2014] MCP Master DB Local timezone: PDT
(-0700)  DB timezone: UTC
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Postgres version:
90207
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Master DB Database port: 5432
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP PID: 15108 WARN (15108) [Wed Mar
12 18:27:40 2014] MCP Postgres backend PID: 15109 WARN (15108) [Wed Mar 12
18:27:40 2014] MCP Postgres library version: 80412 WARN (15108) [Wed Mar 12
18:27:40 2014] MCP bucardo: /usr/sbin/bucardo WARN (15108) [Wed Mar 12
18:27:40 2014] MCP Bucardo.pm:
/usr/share/perl5/vendor_perl/Bucardo.pm
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP OS: linux  Perl: /usr/bin/perl
5.10.1
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP DBI version: 1.609  DBD::Pg
version: 2.15.1 (21501) DBIx::Safe version: 1.2.5 TERSE (15108) [Wed Mar 12
18:27:40 2014] MCP Bucardo object:
 batch               => '0'
 bcverbose           => '1'
 created             => 'Wed Mar 12 11:27:40 2014'
 dbdpgversion        => '21501'
 dbhlist             => 'HASH(0x24c2598)'
 dbname              => 'bucardo'
 dbpass              => '<not shown>'
 dbuser              => 'bucardo'
 dryrun              => '0'
 exit_on_nosync      => '0'
 extraname           => ''
 listening           => 'HASH(0x211f868)'
 logclean            => '0'
 logcodes            => [sub { "DUMMY" }]
 logdest             => ['/var/log/bucardo']
 logextension        => ''
 logpid              => '15108'
 logprefix           => 'MCP'
 logseparate         => '0'
 masterdbh           => 'DBI::db=HASH(0x2177f18)'
 mcp_backend         => '15109'
 mcp_clock_timestamp => 'clock_timestamp()'
 mcppid              => '15087'
 pidfile             => '/var/run/bucardo/bucardo.mcp.pid'
 pidmap              => 'HASH(0x24b68e8)'
 sendmail            => '0'
 sendmail_file       => ''
 sqlprefix           => '/* Bucardo 4.99.10 */'
 stopfile            => '/var/run/bucardo/fullstopbucardo'
 verbose             => '1'
 version             => '4.99.10'
 warning_file        => ''
WARN (15108) [Wed Mar 12 18:27:40 2014] MCP Bucardo config:
 autosync_ddl              => 'newcol'
 bucardo_current_version   => '4.99.10'
 bucardo_vac               => '1'
 bucardo_version           => '4.99.6'
 ctl_checkonkids_time      => '10'
 ctl_createkid_time        => '0.5'
 ctl_sleep                 => '0.2'
 default_conflict_strategy => 'bucardo_latest'
 default_email_from        => 'nobody at example.com'
 default_email_host        => 'localhost'
 default_email_to          => 'nobody at example.com'
 email_debug_file          => ''
 endsync_sleep             => '1.0'
 flatfile_dir              => '.'
 host_safety_check         => ''
 isolation_level           => 'repeatable read'
 kid_deadlock_sleep        => '0'
 kid_nodeltarows_sleep     => '0.5'
 kid_pingtime              => '60'
 kid_restart_sleep         => '1'
 kid_serial_sleep          => '0'
 kid_sleep                 => '0.5'
 log_conflict_file         => '/var/log/bucardo/bucardo_conflict.log'
 log_level                 => 'terse'
 log_level_number          => '1'
 log_microsecond           => '0'
 log_showlevel             => '1'
 log_showline              => '0'
 log_showpid               => '1'
 log_showsyncname          => '1'
 log_showtime              => '2'
 mcp_dbproblem_sleep       => '15'
 mcp_loop_sleep            => '0.2'
 mcp_pingtime              => '60'
 mcp_vactime               => '60'
 piddir                    => '/var/run/bucardo'
 quick_delta_check         => '1'
 reason_file               => '/var/log/bucardo/bucardo.restart.reason.txt'
 reload_config_timeout     => '30'
 semaphore_table           => 'bucardo_status'
 statement_chunk_size      => '10000'
 stats_script_url          => 'http://www.bucardo.org/'
 stopfile                  => 'fullstopbucardo'
 syslog_facility           => 'log_local1'
 tcp_keepalives_count      => '0'
 tcp_keepalives_idle       => '0'
 tcp_keepalives_interval   => '0'
 vac_run                   => '30'
 vac_sleep                 => '120'
 warning_file              => ''

*ERROR 2*
bucardo kick pdx_configuration_reporting_sync Kicked sync
pdx_configuration_reporting_sync

from the log:
TERSE (15108) [Wed Mar 12 18:31:13 2014] MCP Cannot kick inactive sync
"pdx_configuration_reporting_sync"

>From the bucardo status
bucardo status pdx_configuration_reporting_sync
======================================================================
Sync name                : pdx_configuration_reporting_sync
Current state            : No records found
Source relgroup/database : configuration_reporting_rels /
pdx_configuration_db
Tables in sync           : 16
Status                   : Active
Check time               : None
Overdue time             : 00:00:00
Expired time             : 00:00:00
Stayalive/Kidsalive      : Yes / Yes
Rebuild index            : No
Autokick                 : Yes
Onetimecopy              : No
Post-copy analyze        : Yes
Last error:              :
======================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20140312/64
33fe59/attachment.html>

------------------------------

_______________________________________________
Bucardo-general mailing list
Bucardo-general at bucardo.org
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

End of Bucardo-general Digest, Vol 78, Issue 7
**********************************************