[Bucardo-general] Bucardo 5 install issue
Chris Keane
chris.keane at zzgi.com
Mon Mar 12 16:01:01 UTC 2012
Another update:
I sucked it up and changed Bucardo.pm
push @dbs_delta => $dbname
if ($x->{role} eq 'target' || $x->{role} eq 'source')
and ! $sync->{onetimecopy}
and ! $sync->{fullcopy};
(added dbs with a role of source into @db_deltas)... fired off bucardo
and hey presto replication.
OK, so now I know that, what did I do wrong in the setup that forced me
down this path?
Chris.
On 3/12/2012 8:47 AM, Chris Keane wrote:
> Further musings over my inability to get multi-master replication
> running in Bucardo 5. As a reminder, everything sets up seemingly fine,
> bucardo starts without errors or complaints and makes connections to
> both the master databases. When an update occurs on either database, I
> see the kid getting kicked but then zero rows update. There are entries
> in the per-table delta tables, seemingly waiting to be consumed.
>
> I've been poking around in Bucardo.pm to try to figure what I need to
> set or unset to make it go.
> I noticed that in start_kid, at or around line 2895 of Bucardo.pm (git
> head from last week), there is a check:
> ## Do all the delta (non-fullcopy) targets
> if (@dbs_delta) {
> then all the fun happens inside this conditional. However, @dbs_delta is
> empty and I don't ever enter this conditional. So the kid starts up with
> the notify and all is well, it just never gets to the fun part because
> @dbs_delta is empty.
>
> @db_delta is constructed as:
> push @dbs_delta => $dbname
> if $x->{role} eq 'target'
> and ! $sync->{onetimecopy}
> and ! $sync->{fullcopy};
> Since the databases in question are set up as sources, they're never
> added to @dbs_delta. Do I need to set up at least one target in
> additional to the sources?
>
> Chris.
>
>
>
> On 3/9/2012 9:16 PM, Chris Keane wrote:
>>
>> On 3/9/2012 5:08 PM, Greg Sabino Mullane wrote:
>>>> 1. The only "suspicious" thing is that I get messages saying
>>>> Inspecting source table "weird.table1" on database "localbox"
>>>> but I never see the same message for "internetbox"
>>> It should be inspecting both sides, but those messages may be at a lower
>>> debug level than you are using. As a general principle, when things
>>> go wrong, lower the debug level to verbose and then to debug, for a
>>> clearer (and verbose) picture of what is happening:
>>>
>>> $ bucardo set log_level=verbose
>> I suspect I was already at verbose level, no new messages were produced.
>> Here's what it looks like starting up (note, there are really 4
>> different syncs that represent 4 different herds.
>> (14349) [Fri Mar 9 23:54:19 2012] MCP Loading sync table. Rows=4
>> (14349) [Fri Mar 9 23:54:19 2012] MCP Activating sync "interesting"
>> (14349) [Fri Mar 9 23:54:19 2012] MCP Running validate_sync on
>> "interesting"
>> (14349) [Fri Mar 9 23:54:19 2012] MCP Connecting to database "ecs"
>> (source)
>> (14349) [Fri Mar 9 23:54:20 2012] MCP Database "ecs" backend PID: 14351
>> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local epoch:
>> 1331355260.02512 DB epoch: 1331355260.02565
>> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local time: Fri Mar 9
>> 23:54:20 2012 DB time: 2012-03-09 23:54:20.025651-05
>> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Local timezone: EST
>> (-0500) DB timezone: localtime
>> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Postgres version: 90103
>> (14349) [Fri Mar 9 23:54:20 2012] MCP DB "ecs" Database port: 5432
>> (14349) [Fri Mar 9 23:54:20 2012] MCP Connecting to database
>> "internet" (source)
>> (14349) [Fri Mar 9 23:54:21 2012] MCP Database "internet" backend
>> PID: 16959
>> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local epoch:
>> 1331355261.17374 DB epoch: 1331355261.97192
>> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local time: Fri
>> Mar 9 23:54:21 2012 DB time: 2012-03-10 04:54:21.971921+00
>> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Local timezone:
>> EST (-0500) DB timezone: UTC
>> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Postgres version:
>> 90101
>> (14349) [Fri Mar 9 23:54:21 2012] MCP DB "internet" Database port: 5432
>> (14349) [Fri Mar 9 23:54:24 2012] MCP Inspecting source table
>> "master.detect" on database "ecs"
>> (14349) [Fri Mar 9 23:54:24 2012] MCP Inspecting source table
>> "master.race_audit" on database "ecs"
>> (14349) [Fri Mar 9 23:54:24 2012] MCP Activating sync "noncritical"
>> (14349) [Fri Mar 9 23:54:24 2012] MCP Running validate_sync on
>> "noncritical"
>> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
>> "master.addressbook" on database "ecs"
>> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
>> "master.affiliate" on database "ecs"
>> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
>> "master.class_rules" on database "ecs"
>> (14349) [Fri Mar 9 23:54:53 2012] MCP Inspecting source table
>> "master.classes" on database "ecs"
>> ....
>> Note, no mention of the database "internet" at all except for the
>> successful initial connection. Table inspections occur only on "ecs",
>> which is the local database.
>> There are no errors or anything that appears noteworthy in the logs
>> and with the exception of missing inspection on the internet side, it
>> looks perfectly normal.
>>
>> When an update is run on one of the replicated tables on EITHER the
>> local "ecs" or remote "internet" dbs, I get
>> (14349) [Fri Mar 9 23:58:50 2012] MCP Sent a kick to controller 14361
>> for sync "opscritical"
>> (14380) [Fri Mar 9 23:58:51 2012] KID Issuing final commit for all
>> databases
>> (14380) [Fri Mar 9 23:58:51 2012] KID All databases committed
>> (14380) [Fri Mar 9 23:58:51 2012] KID Total time for sync
>> "opscritical" (0 rows): 0.14 seconds
>>
>> So the notify seems to be happening... just nothing after that
>>
>> In other words, no rows copied across even though the databases are
>> now out of sync. Same output when manually kicking the sync.
>> Every now and then, I get a line for each replicated table:
>> NOTICE: Rows deleted from delta_master_tablename: 0 Rows deleted from
>> track_master_tablename: 0
>>
>> (master is the schema, tablename the name of the table)
>>
>> Then sometimes, I also get a
>> (13539) [Fri Mar 9 23:49:15 2012] VAC Running bucardo_purge_delta on
>> database "ecs"
>> (13539) [Fri Mar 9 23:49:15 2012] VAC Running bucardo_purge_delta on
>> database "internet"
>>
>> I've solved many weird problems in the past with 4.x, so I'm fairly
>> experienced at troubleshooting. This one, however, has me scratching
>> my head. bucardo install was done on the local database (same postgres
>> server as "ecs"). All tables are heavily triggered (apps actually
>> refer to views in the public schema that serve data from the "master"
>> schema based on various permission rules, so all updates, inserts and
>> deletes run our own triggers that update various permission maps. I
>> do, however, see the bucardo triggers correctly applied and active
>> right alongside ours). I've removed the entire thing, gone back to
>> Bucardo 4.4 and it works fine. Moving back up to Bucardo 5, same problem.
>>
>> Here's an example goat entry:
>> id | db | schemaname | tablename | reltype
>> |
>> pkey
>> |
>> qpkey |
>> pkeytype | has_delta | ping | customselect |
>> standard_conflict | makedelta | rebuild_index | ghost |
>> analyze_after_copy | vacuum_after_copy | strict_checking |
>> delta_bypass | delta_bypass_min | delta_bypass_count |
>> delta_bypass_percent | cdate
>> ----+-----+------------+--------------------+---------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------------------------------+-----------+------+--------------+-------------------+-----------+---------------+-------+--------------------+-------------------+-----------------+--------------+------------------+--------------------+----------------------+-------------------------------
>>
>> 1 | ecs | master | laptimes | table |
>> lt_ltid
>> |
>> lt_ltid
>> | int8 | f | |
>> | | | 0 | f |
>> t | t | t |
>> f | |
>> | | 2012-03-09 16:52:55.3795-05
>>
>> and an example entry from sync:
>> name | herd | dbs | stayalive |
>> kidsalive | standard_conflict | usecustomselect | copyextra |
>> deletemethod | limitdbs | ping | do_listen | checktime | status |
>> makedelta | rebuild_index | priority | analyze_after_copy |
>> vacuum_after_copy | strict_checking | overdue | expired |
>> track_rates | onetimecopy | lifetime | maxkicks | cdate
>> --------------+--------------+------------------+-----------+-----------+-------------------+-----------------+-----------+--------------+----------+------+-----------+-----------+--------+-----------+---------------+----------+--------------------+-------------------+-----------------+----------+----------+-------------+-------------+----------+----------+-------------------------------
>>
>> racecritical | racecritical | replicationgroup | t |
>> t | | f | |
>> delete | 0 | t | f | | active
>> | | 0 | 0 | t |
>> f | t | 00:00:00 | 00:00:00 |
>> f | 0 | | 0 | 2012-03-09
>> 16:59:49.569907-05
>>
>>
>> Finally, db:
>> name | dbtype | dbhost | dbport | dbname | dbuser
>> | dbpass | dbconn | dbservice | pgpass | status |
>> server_side_prepares | makedelta | cdate
>> ----------+----------+---------------------+--------+--------+---------+--------+-----------------+-----------+--------+--------+----------------------+-----------+-------------------------------
>>
>> ecs | postgres | | | utvrr | bucardo
>> | | | | | active |
>> t | f | 2012-03-09 16:52:32.581565-05
>> internet | postgres | xxxxxxx.xxxxxxx.com | 5432 | utvrr | bucardo
>> | | sslmode=require | | | active |
>> t | f | 2012-03-09 16:52:44.533349-05
>>
>> Any thoughts on this appreciated.
>>
>> Chris.
>>
>>
> _______________________________________________
> Bucardo-general mailing list
> Bucardo-general at bucardo.org
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
More information about the Bucardo-general
mailing list