[Bucardo-general] Bucardo 5 install issue

Chris Keane chris.keane at zzgi.com
Mon Mar 12 16:01:01 UTC 2012


Another update:
I sucked it up and changed Bucardo.pm

         push @dbs_delta => $dbname
             if ($x->{role} eq 'target' || $x->{role} eq 'source')
                 and ! $sync->{onetimecopy}
                     and ! $sync->{fullcopy};

(added dbs with a role of source into @db_deltas)... fired off bucardo 
and hey presto replication.
OK, so now I know that, what did I do wrong in the setup that forced me 
down this path?

Chris.


On 3/12/2012 8:47 AM, Chris Keane wrote:
> Further musings over my inability to get multi-master replication
> running in Bucardo 5. As a reminder, everything sets up seemingly fine,
> bucardo starts without errors or complaints and makes connections to
> both the master databases. When an update occurs on either database, I
> see the kid getting kicked but then zero rows update. There are entries
> in the per-table delta tables, seemingly waiting to be consumed.
>
> I've been poking around in Bucardo.pm to try to figure what I need to
> set or unset to make it go.
> I noticed that in start_kid, at or around line 2895 of Bucardo.pm (git
> head from last week), there is a check:
>           ## Do all the delta (non-fullcopy) targets
>           if (@dbs_delta) {
> then all the fun happens inside this conditional. However, @dbs_delta is
> empty and I don't ever enter this conditional. So the kid starts up with
> the notify and all is well, it just never gets to the fun part because
> @dbs_delta is empty.
>
> @db_delta is constructed as:
> push @dbs_delta =>  $dbname
>               if $x->{role} eq 'target'
>                   and ! $sync->{onetimecopy}
>                       and ! $sync->{fullcopy};
> Since the databases in question are set up as sources, they're never
> added to @dbs_delta. Do I need to set up at least one target in
> additional to the sources?
>
> Chris.
>
>
>
> On 3/9/2012 9:16 PM, Chris Keane wrote:
>>
>> On 3/9/2012 5:08 PM, Greg Sabino Mullane wrote:
>>>>       1. The only "suspicious" thing is that I get messages saying
>>>>          Inspecting source table "weird.table1" on database "localbox"
>>>>          but I never see the same message for "internetbox"
>>> It should be inspecting both sides, but those messages may be at a lower
>>> debug level than you are using. As a general principle, when things
>>> go wrong, lower the debug level to verbose and then to debug, for a
>>> clearer (and verbose) picture of what is happening:
>>>
>>> $ bucardo set log_level=verbose
>> I suspect I was already at verbose level, no new messages were produced.
>> Here's what it looks like starting up (note, there are really 4
>> different syncs that represent 4 different herds.
>> (14349) [Fri Mar  9 23:54:19 2012] MCP Loading sync table. Rows=4
>> (14349) [Fri Mar  9 23:54:19 2012] MCP Activating sync "interesting"
>> (14349) [Fri Mar  9 23:54:19 2012] MCP Running validate_sync on
>> "interesting"
>> (14349) [Fri Mar  9 23:54:19 2012] MCP Connecting to database "ecs"
>> (source)
>> (14349) [Fri Mar  9 23:54:20 2012] MCP Database "ecs" backend PID: 14351
>> (14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local epoch:
>> 1331355260.02512  DB epoch: 1331355260.02565
>> (14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local time: Fri Mar  9
>> 23:54:20 2012  DB time: 2012-03-09 23:54:20.025651-05
>> (14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local timezone: EST
>> (-0500)  DB timezone: localtime
>> (14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Postgres version: 90103
>> (14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Database port: 5432
>> (14349) [Fri Mar  9 23:54:20 2012] MCP Connecting to database
>> "internet" (source)
>> (14349) [Fri Mar  9 23:54:21 2012] MCP Database "internet" backend
>> PID: 16959
>> (14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local epoch:
>> 1331355261.17374  DB epoch: 1331355261.97192
>> (14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local time: Fri
>> Mar  9 23:54:21 2012  DB time: 2012-03-10 04:54:21.971921+00
>> (14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local timezone:
>> EST (-0500)  DB timezone: UTC
>> (14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Postgres version:
>> 90101
>> (14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Database port: 5432
>> (14349) [Fri Mar  9 23:54:24 2012] MCP   Inspecting source table
>> "master.detect" on database "ecs"
>> (14349) [Fri Mar  9 23:54:24 2012] MCP   Inspecting source table
>> "master.race_audit" on database "ecs"
>> (14349) [Fri Mar  9 23:54:24 2012] MCP Activating sync "noncritical"
>> (14349) [Fri Mar  9 23:54:24 2012] MCP Running validate_sync on
>> "noncritical"
>> (14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table
>> "master.addressbook" on database "ecs"
>> (14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table
>> "master.affiliate" on database "ecs"
>> (14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table
>> "master.class_rules" on database "ecs"
>> (14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table
>> "master.classes" on database "ecs"
>> ....
>> Note, no mention of the database "internet" at all except for the
>> successful initial connection. Table inspections occur only on "ecs",
>> which is the local database.
>> There are no errors or anything that appears noteworthy in the logs
>> and with the exception of missing inspection on the internet side, it
>> looks perfectly normal.
>>
>> When an update is run on one of the replicated tables on EITHER the
>> local "ecs" or remote "internet" dbs, I get
>> (14349) [Fri Mar  9 23:58:50 2012] MCP Sent a kick to controller 14361
>> for sync "opscritical"
>> (14380) [Fri Mar  9 23:58:51 2012] KID Issuing final commit for all
>> databases
>> (14380) [Fri Mar  9 23:58:51 2012] KID All databases committed
>> (14380) [Fri Mar  9 23:58:51 2012] KID Total time for sync
>> "opscritical" (0 rows): 0.14 seconds
>>
>> So the notify seems to be happening... just nothing after that
>>
>> In other words, no rows copied across even though the databases are
>> now out of sync. Same output when manually kicking the sync.
>> Every now and then, I get a line for each replicated table:
>> NOTICE:  Rows deleted from delta_master_tablename: 0 Rows deleted from
>> track_master_tablename: 0
>>
>> (master is the schema, tablename the name of the table)
>>
>> Then sometimes, I also get a
>> (13539) [Fri Mar  9 23:49:15 2012] VAC Running bucardo_purge_delta on
>> database "ecs"
>> (13539) [Fri Mar  9 23:49:15 2012] VAC Running bucardo_purge_delta on
>> database "internet"
>>
>> I've solved many weird problems in the past with 4.x, so I'm fairly
>> experienced at troubleshooting. This one, however, has me scratching
>> my head. bucardo install was done on the local database (same postgres
>> server as "ecs"). All tables are heavily triggered (apps actually
>> refer to views in the public schema that serve data from the "master"
>> schema based on various permission rules, so all updates, inserts and
>> deletes run our own triggers that update various permission maps. I
>> do, however, see the bucardo triggers correctly applied and active
>> right alongside ours). I've removed the entire thing, gone back to
>> Bucardo 4.4 and it works fine. Moving back up to Bucardo 5, same problem.
>>
>> Here's an example goat entry:
>> id | db  | schemaname |     tablename      | reltype
>> |
>> pkey
>> |
>> qpkey                                 |
>> pkeytype             | has_delta | ping | customselect |
>> standard_conflict | makedelta | rebuild_index | ghost |
>> analyze_after_copy | vacuum_after_copy | strict_checking |
>> delta_bypass | delta_bypass_min | delta_bypass_count |
>> delta_bypass_percent |             cdate
>> ----+-----+------------+--------------------+---------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------------------------------+-----------+------+--------------+-------------------+-----------+---------------+-------+--------------------+-------------------+-----------------+--------------+------------------+--------------------+----------------------+-------------------------------
>>
>>    1 | ecs | master     | laptimes           | table   |
>> lt_ltid
>> |
>> lt_ltid
>> | int8                            | f         |      |
>> |                   |           |             0 | f     |
>> t                  | t                 | t               |
>> f            |                  |
>> |                      | 2012-03-09 16:52:55.3795-05
>>
>> and an example entry from sync:
>>       name     |     herd     |       dbs        | stayalive |
>> kidsalive | standard_conflict | usecustomselect | copyextra |
>> deletemethod | limitdbs | ping | do_listen | checktime | status |
>> makedelta | rebuild_index | priority | analyze_after_copy |
>> vacuum_after_copy | strict_checking | overdue  | expired  |
>> track_rates | onetimecopy | lifetime | maxkicks |             cdate
>> --------------+--------------+------------------+-----------+-----------+-------------------+-----------------+-----------+--------------+----------+------+-----------+-----------+--------+-----------+---------------+----------+--------------------+-------------------+-----------------+----------+----------+-------------+-------------+----------+----------+-------------------------------
>>
>>   racecritical | racecritical | replicationgroup | t         |
>> t         |                   | f               |           |
>> delete       |        0 | t    | f         |           | active
>> |           |             0 |        0 | t                  |
>> f                 | t               | 00:00:00 | 00:00:00 |
>> f           |           0 |          |        0 | 2012-03-09
>> 16:59:49.569907-05
>>
>>
>> Finally, db:
>>     name   |  dbtype  |       dbhost        | dbport | dbname | dbuser
>> | dbpass |     dbconn      | dbservice | pgpass | status |
>> server_side_prepares | makedelta |             cdate
>> ----------+----------+---------------------+--------+--------+---------+--------+-----------------+-----------+--------+--------+----------------------+-----------+-------------------------------
>>
>>   ecs      | postgres |                     |        | utvrr  | bucardo
>> |        |                 |           |        | active |
>> t                    | f         | 2012-03-09 16:52:32.581565-05
>>   internet | postgres | xxxxxxx.xxxxxxx.com | 5432   | utvrr  | bucardo
>> |        | sslmode=require |           |        | active |
>> t                    | f         | 2012-03-09 16:52:44.533349-05
>>
>> Any thoughts on this appreciated.
>>
>> Chris.
>>
>>
> _______________________________________________
> Bucardo-general mailing list
> Bucardo-general at bucardo.org
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general



More information about the Bucardo-general mailing list