[Bucardo-general] Bucardo 5 install issue
Chris Keane
chris.keane at zzgi.com
Mon Mar 12 15:47:07 UTC 2012
Further musings over my inability to get multi-master replication
running in Bucardo 5. As a reminder, everything sets up seemingly fine,
bucardo starts without errors or complaints and makes connections to
both the master databases. When an update occurs on either database, I
see the kid getting kicked but then zero rows update. There are entries
in the per-table delta tables, seemingly waiting to be consumed.
I've been poking around in Bucardo.pm to try to figure what I need to
set or unset to make it go.
I noticed that in start_kid, at or around line 2895 of Bucardo.pm (git
head from last week), there is a check:
## Do all the delta (non-fullcopy) targets
if (@dbs_delta) {
then all the fun happens inside this conditional. However, @dbs_delta is
empty and I don't ever enter this conditional. So the kid starts up with
the notify and all is well, it just never gets to the fun part because
@dbs_delta is empty.
@db_delta is constructed as:
push @dbs_delta => $dbname
if $x->{role} eq 'target'
and ! $sync->{onetimecopy}
and ! $sync->{fullcopy};
Since the databases in question are set up as sources, they're never
added to @dbs_delta. Do I need to set up at least one target in
additional to the sources?
Chris.
On 3/9/2012 9:16 PM, Chris Keane wrote:
>
>
> On 3/9/2012 5:08 PM, Greg Sabino Mullane wrote:
>>> 1. The only "suspicious" thing is that I get messages saying
>>> Inspecting source table "weird.table1" on database "localbox"
>>> but I never see the same message for "internetbox"
>> It should be inspecting both sides, but those messages may be at a lower
>> debug level than you are using. As a general principle, when things
>> go wrong, lower the debug level to verbose and then to debug, for a
>> clearer (and verbose) picture of what is happening:
>>
>> $ bucardo set log_level=verbose
> I suspect I was already at verbose level, no new messages were produced.
> Here's what it looks like starting up (note, there are really 4
> different syncs that represent 4 different herds.
> (14349) [Fri Mar 9 23:54:19 2012] MCP Loading sync table. Rows=4
> (14349) [Fri Mar 9 23:54:19 2012] MCP Activating sync "interesting"
> (14349) [Fri Mar 9 23:54:19 2012] MCP Running validate_sync on
> "interesting"
> (14349) [Fri Mar 9 23:54:19 2012] MCP Connecting to database "ecs"
> (source)
> (14349) [Fri Mar 9 23:54:20 2012] MCP Database "ecs" backend PID: 14351
> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local epoch:
> 1331355260.02512 DB epoch: 1331355260.02565
> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local time: Fri Mar 9
> 23:54:20 2012 DB time: 2012-03-09 23:54:20.025651-05
> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local timezone: EST
> (-0500) DB timezone: localtime
> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Postgres version: 90103
> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Database port: 5432
> (14349) [Fri Mar 9 23:54:20 2012] MCP Connecting to database
> "internet" (source)
> (14349) [Fri Mar 9 23:54:21 2012] MCP Database "internet" backend
> PID: 16959
> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local epoch:
> 1331355261.17374 DB epoch: 1331355261.97192
> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local time: Fri
> Mar 9 23:54:21 2012 DB time: 2012-03-10 04:54:21.971921+00
> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local timezone:
> EST (-0500) DB timezone: UTC
> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Postgres version:
> 90101
> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Database port: 5432
> (14349) [Fri Mar 9 23:54:24 2012] MCP Inspecting source table
> "master.detect" on database "ecs"
> (14349) [Fri Mar 9 23:54:24 2012] MCP Inspecting source table
> "master.race_audit" on database "ecs"
> (14349) [Fri Mar 9 23:54:24 2012] MCP Activating sync "noncritical"
> (14349) [Fri Mar 9 23:54:24 2012] MCP Running validate_sync on
> "noncritical"
> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
> "master.addressbook" on database "ecs"
> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
> "master.affiliate" on database "ecs"
> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
> "master.class_rules" on database "ecs"
> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
> "master.classes" on database "ecs"
> ....
> Note, no mention of the database "internet" at all except for the
> successful initial connection. Table inspections occur only on "ecs",
> which is the local database.
> There are no errors or anything that appears noteworthy in the logs
> and with the exception of missing inspection on the internet side, it
> looks perfectly normal.
>
> When an update is run on one of the replicated tables on EITHER the
> local "ecs" or remote "internet" dbs, I get
> (14349) [Fri Mar 9 23:58:50 2012] MCP Sent a kick to controller 14361
> for sync "opscritical"
> (14380) [Fri Mar 9 23:58:51 2012] KID Issuing final commit for all
> databases
> (14380) [Fri Mar 9 23:58:51 2012] KID All databases committed
> (14380) [Fri Mar 9 23:58:51 2012] KID Total time for sync
> "opscritical" (0 rows): 0.14 seconds
>
> So the notify seems to be happening... just nothing after that
>
> In other words, no rows copied across even though the databases are
> now out of sync. Same output when manually kicking the sync.
> Every now and then, I get a line for each replicated table:
> NOTICE: Rows deleted from delta_master_tablename: 0 Rows deleted from
> track_master_tablename: 0
>
> (master is the schema, tablename the name of the table)
>
> Then sometimes, I also get a
> (13539) [Fri Mar 9 23:49:15 2012] VAC Running bucardo_purge_delta on
> database "ecs"
> (13539) [Fri Mar 9 23:49:15 2012] VAC Running bucardo_purge_delta on
> database "internet"
>
> I've solved many weird problems in the past with 4.x, so I'm fairly
> experienced at troubleshooting. This one, however, has me scratching
> my head. bucardo install was done on the local database (same postgres
> server as "ecs"). All tables are heavily triggered (apps actually
> refer to views in the public schema that serve data from the "master"
> schema based on various permission rules, so all updates, inserts and
> deletes run our own triggers that update various permission maps. I
> do, however, see the bucardo triggers correctly applied and active
> right alongside ours). I've removed the entire thing, gone back to
> Bucardo 4.4 and it works fine. Moving back up to Bucardo 5, same problem.
>
> Here's an example goat entry:
> id | db | schemaname | tablename | reltype
> |
> pkey
> |
> qpkey |
> pkeytype | has_delta | ping | customselect |
> standard_conflict | makedelta | rebuild_index | ghost |
> analyze_after_copy | vacuum_after_copy | strict_checking |
> delta_bypass | delta_bypass_min | delta_bypass_count |
> delta_bypass_percent | cdate
> ----+-----+------------+--------------------+---------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------------------------------+-----------+------+--------------+-------------------+-----------+---------------+-------+--------------------+-------------------+-----------------+--------------+------------------+--------------------+----------------------+-------------------------------
>
> 1 | ecs | master | laptimes | table |
> lt_ltid
> |
> lt_ltid
> | int8 | f | |
> | | | 0 | f |
> t | t | t |
> f | |
> | | 2012-03-09 16:52:55.3795-05
>
> and an example entry from sync:
> name | herd | dbs | stayalive |
> kidsalive | standard_conflict | usecustomselect | copyextra |
> deletemethod | limitdbs | ping | do_listen | checktime | status |
> makedelta | rebuild_index | priority | analyze_after_copy |
> vacuum_after_copy | strict_checking | overdue | expired |
> track_rates | onetimecopy | lifetime | maxkicks | cdate
> --------------+--------------+------------------+-----------+-----------+-------------------+-----------------+-----------+--------------+----------+------+-----------+-----------+--------+-----------+---------------+----------+--------------------+-------------------+-----------------+----------+----------+-------------+-------------+----------+----------+-------------------------------
>
> racecritical | racecritical | replicationgroup | t |
> t | | f | |
> delete | 0 | t | f | | active
> | | 0 | 0 | t |
> f | t | 00:00:00 | 00:00:00 |
> f | 0 | | 0 | 2012-03-09
> 16:59:49.569907-05
>
>
> Finally, db:
> name | dbtype | dbhost | dbport | dbname | dbuser
> | dbpass | dbconn | dbservice | pgpass | status |
> server_side_prepares | makedelta | cdate
> ----------+----------+---------------------+--------+--------+---------+--------+-----------------+-----------+--------+--------+----------------------+-----------+-------------------------------
>
> ecs | postgres | | | utvrr | bucardo
> | | | | | active |
> t | f | 2012-03-09 16:52:32.581565-05
> internet | postgres | xxxxxxx.xxxxxxx.com | 5432 | utvrr | bucardo
> | | sslmode=require | | | active |
> t | f | 2012-03-09 16:52:44.533349-05
>
> Any thoughts on this appreciated.
>
> Chris.
>
>
More information about the Bucardo-general
mailing list