[Bucardo-general] master slave not syncing after db upgrade. (triggers disabled)

Mon Aug 13 13:53:29 UTC 2012

On Sat, 11 Aug 2012 18:18:25 -0500
Rosser Schwarz <rosser.schwarz at gmail.com> wrote:

> Can you connect to your Bucardo database and say "SELECT
> validate_all_syncs();" and report what happens?
> 
> rls
> 
> On Mon, Aug 6, 2012 at 3:57 PM, jtkells <jtkells at verizon.net> wrote:
> > On Mon, 6 Aug 2012 09:27:32 -0400
> > jtkells <jtkells at verizon.net> wrote:
> >
> >> Hi,
> >>
> >> I'm having a bit of trouble here getting a master to slave
> >> replication environment working after a database schema upgrade.
> >> I am using bucardo 4.4.8 on a postgreSQL 8..4.8 database
> >>
> >> I have been running this master slave configuration for a long
> >> time. We recently updated our schema(adding a lot of new columns
> >> etc. to a lot of tables). To accommodate these changes I performed
> >> the following steps:
> >> 1) I stop bucardo
> >> 2) I remove all tables from bucardo
> >> 3) I remove the herd that these tables belonged to
> >> On the database side (Master)
> >> I drop the schema and recreate the schema and all its tables (new
> >> columns)
> >> I load the tables through program code which generates millions of
> >> records to these tables (100).
> >> I do a pg_dump of this schema and copy it over to the slave
> >> database On the slave database:
> >> I drop all the replicated tables and run pg_restore.
> >>
> >> On both system I analyze these tables
> >> On the master database I
> >> 4) I add the tables back into bucardo
> >> 5) I create the herd for them
> >> 6) and I start bucardo
> >> Bucardo goes through checks and generates the following record for
> >> each of the tables
> >> [Mon Aug  6 09:16:07 2012]  CTL   Herd member 19494511:
> >> ac_5300_18b_esri.fence
> >> [Mon Aug  6 09:16:07 2012]  CTL     Target oids: agis_slave:4480021
> >>
> >> I update some columns in a table to test replication and nothing
> >> happens. I have tried to do several commands to get bucardo to
> >> start processing the new changes (reload, kick etc.) but still
> >> nothing.  I suspect the "Latest bad reason: Controller cleaning
> >> out unstarted q entry  " is causing the problem but not sure how
> >> to fix this? Should I have deleted the sync's?
> >>
> >>
> >> Name     Type  State PID   Last_good Time  I/U/D Last_bad Time
> >> ========+=====+=====+=====+=========+=====+=====+========+====
> >> agis_18b| P   |idle |12596|4m39s    |9s   |0/0/0|25m49s  |0s
> >>
> >>
> >> Sync: agis_18b  (pushdelta)  esri18b =>  agis_slave  (Active)
> >>
> >>
> >> postgres at arp-db:~$ bucardo_ctl status agis_18b
> >> Days back: 3  User: bucardo  Database: bucardo
> >> ======================================================================
> >> Sync name:            agis_18b
> >> Current state:        idle (PID = 12596)
> >> Type:                 pushdelta
> >> Source herd/database: esri18b / agis_master
> >> Target database:      agis_slave
> >> Tables in sync:       100
> >> Last good:            5m 25s (time to run: 9s)
> >> Last good time:       Aug 06, 2012 09:16:17  Target: agis_slave
> >> Ins/Upd/Del:          0 / 0 / 0
> >> Last bad:             26m 35s (time to run: 0s)
> >> Last bad time:        Aug 06, 2012 08:55:07  Target: agis_slave
> >> Latest bad reason: Controller cleaning out unstarted q entry
> >> PID file:             /tmp/bucardo.ctl.sync.agis_18b.pid
> >> PID file created:     Mon Aug  6 09:16:07 2012
> >> Status:               active
> >> Limitdbs:             0
> >> Priority:             0
> >> Checktime:            none
> >> Overdue time:         00:00:00
> >> Expired time:         00:00:00
> >> Stayalive:            yes      Kidsalive: yes
> >> Rebuild index:        0        Do_listen: no
> >> Ping:                 yes      Makedelta: no
> >> Onetimecopy:          0
> >>
> >>
> >>
> >> Thanking you in advance
> >
> >
> > Further investigation I updated some records and saw that no
> > entries in the q table were created.  There are triggers on the
> > tables but looking at the triggers in pg_trigger table I find that
> > the triggers are disabled (tgenabled = FALSE in pg_trigger table).
> > What process did I miss in bucardo that caused this (I dropped the
> > tables and herd)?  If I didn't miss anything is it safe to enable
> > these triggers at the PostgreSQL level and is there anything else I
> > need to do?  Also, was there anything else that I should have done
> > when I was removing the tables and herds in the first place?
> >
> > Thanking you in advance
> > _______________________________________________
> > Bucardo-general mailing list
> > Bucardo-general at bucardo.org
> > https://mail.endcrypt.com/mailman/listinfo/bucardo-general
> 
> 
> 

Rosser,

bucardo=# select bucardo.validate_all_syncs();
 validate_all_syncs 
--------------------
                  1

I tried the bucardo_ctl validate sync command back when I was having the
problem and it reporting no issues.  
I had stated that the triggers weren't enabled but I was wrong in
stating that.  The column tgenabled in pg_trigger showed o which I
assumed to be false but later realized that o was origin and they were
enabled. So for now I'm not sure why it stalled and how it started
at a later point in time.  

I will be repeating this step within the next few days and will have
more control on watching the process and outcome.