[Bucardo-general] Bucardo 5 install issue

Sat Mar 10 05:16:21 UTC 2012

On 3/9/2012 5:08 PM, Greg Sabino Mullane wrote:
>>      1. The only "suspicious" thing is that I get messages saying
>>         Inspecting source table "weird.table1" on database "localbox"
>>         but I never see the same message for "internetbox"
> It should be inspecting both sides, but those messages may be at a lower
> debug level than you are using. As a general principle, when things
> go wrong, lower the debug level to verbose and then to debug, for a
> clearer (and verbose) picture of what is happening:
>
> $ bucardo set log_level=verbose
I suspect I was already at verbose level, no new messages were produced.
Here's what it looks like starting up (note, there are really 4 
different syncs that represent 4 different herds.
(14349) [Fri Mar  9 23:54:19 2012] MCP Loading sync table. Rows=4
(14349) [Fri Mar  9 23:54:19 2012] MCP Activating sync "interesting"
(14349) [Fri Mar  9 23:54:19 2012] MCP Running validate_sync on 
"interesting"
(14349) [Fri Mar  9 23:54:19 2012] MCP Connecting to database "ecs" (source)
(14349) [Fri Mar  9 23:54:20 2012] MCP Database "ecs" backend PID: 14351
(14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local epoch: 
1331355260.02512  DB epoch: 1331355260.02565
(14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local time: Fri Mar  9 
23:54:20 2012  DB time: 2012-03-09 23:54:20.025651-05
(14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Local timezone: EST 
(-0500)  DB timezone: localtime
(14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Postgres version: 90103
(14349) [Fri Mar  9 23:54:20 2012] MCP DB "ecs" Database port: 5432
(14349) [Fri Mar  9 23:54:20 2012] MCP Connecting to database "internet" 
(source)
(14349) [Fri Mar  9 23:54:21 2012] MCP Database "internet" backend PID: 
16959
(14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local epoch: 
1331355261.17374  DB epoch: 1331355261.97192
(14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local time: Fri 
Mar  9 23:54:21 2012  DB time: 2012-03-10 04:54:21.971921+00
(14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Local timezone: EST 
(-0500)  DB timezone: UTC
(14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Postgres version: 90101
(14349) [Fri Mar  9 23:54:21 2012] MCP DB "internet" Database port: 5432
(14349) [Fri Mar  9 23:54:24 2012] MCP   Inspecting source table 
"master.detect" on database "ecs"
(14349) [Fri Mar  9 23:54:24 2012] MCP   Inspecting source table 
"master.race_audit" on database "ecs"
(14349) [Fri Mar  9 23:54:24 2012] MCP Activating sync "noncritical"
(14349) [Fri Mar  9 23:54:24 2012] MCP Running validate_sync on 
"noncritical"
(14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table 
"master.addressbook" on database "ecs"
(14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table 
"master.affiliate" on database "ecs"
(14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table 
"master.class_rules" on database "ecs"
(14349) [Fri Mar  9 23:54:53 2012] MCP   Inspecting source table 
"master.classes" on database "ecs"
....
Note, no mention of the database "internet" at all except for the 
successful initial connection. Table inspections occur only on "ecs", 
which is the local database.
There are no errors or anything that appears noteworthy in the logs and 
with the exception of missing inspection on the internet side, it looks 
perfectly normal.

When an update is run on one of the replicated tables on EITHER the 
local "ecs" or remote "internet" dbs, I get
(14349) [Fri Mar  9 23:58:50 2012] MCP Sent a kick to controller 14361 
for sync "opscritical"
(14380) [Fri Mar  9 23:58:51 2012] KID Issuing final commit for all 
databases
(14380) [Fri Mar  9 23:58:51 2012] KID All databases committed
(14380) [Fri Mar  9 23:58:51 2012] KID Total time for sync "opscritical" 
(0 rows): 0.14 seconds

So the notify seems to be happening... just nothing after that

In other words, no rows copied across even though the databases are now 
out of sync. Same output when manually kicking the sync.
Every now and then, I get a line for each replicated table:
NOTICE:  Rows deleted from delta_master_tablename: 0 Rows deleted from 
track_master_tablename: 0

(master is the schema, tablename the name of the table)

Then sometimes, I also get a
(13539) [Fri Mar  9 23:49:15 2012] VAC Running bucardo_purge_delta on 
database "ecs"
(13539) [Fri Mar  9 23:49:15 2012] VAC Running bucardo_purge_delta on 
database "internet"

I've solved many weird problems in the past with 4.x, so I'm fairly 
experienced at troubleshooting. This one, however, has me scratching my 
head. bucardo install was done on the local database (same postgres 
server as "ecs"). All tables are heavily triggered (apps actually refer 
to views in the public schema that serve data from the "master" schema 
based on various permission rules, so all updates, inserts and deletes 
run our own triggers that update various permission maps. I do, however, 
see the bucardo triggers correctly applied and active right alongside 
ours). I've removed the entire thing, gone back to Bucardo 4.4 and it 
works fine. Moving back up to Bucardo 5, same problem.

Here's an example goat entry:
id | db  | schemaname |     tablename      | reltype 
|                                 pkey                                  
|                                 qpkey                                 
|            pkeytype             | has_delta | ping | customselect | 
standard_conflict | makedelta | rebuild_index | ghost | 
analyze_after_copy | vacuum_after_copy | strict_checking | delta_bypass 
| delta_bypass_min | delta_bypass_count | delta_bypass_percent 
|             cdate
----+-----+------------+--------------------+---------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------------------------------+-----------+------+--------------+-------------------+-----------+---------------+-------+--------------------+-------------------+-----------------+--------------+------------------+--------------------+----------------------+-------------------------------
   1 | ecs | master     | laptimes           | table   | 
lt_ltid                                                               | 
lt_ltid                                                               | 
int8                            | f         |      |              
|                   |           |             0 | f     | 
t                  | t                 | t               | f            
|                  |                    |                      | 
2012-03-09 16:52:55.3795-05

and an example entry from sync:
      name     |     herd     |       dbs        | stayalive | kidsalive 
| standard_conflict | usecustomselect | copyextra | deletemethod | 
limitdbs | ping | do_listen | checktime | status | makedelta | 
rebuild_index | priority | analyze_after_copy | vacuum_after_copy | 
strict_checking | overdue  | expired  | track_rates | onetimecopy | 
lifetime | maxkicks |             cdate
--------------+--------------+------------------+-----------+-----------+-------------------+-----------------+-----------+--------------+----------+------+-----------+-----------+--------+-----------+---------------+----------+--------------------+-------------------+-----------------+----------+----------+-------------+-------------+----------+----------+-------------------------------
  racecritical | racecritical | replicationgroup | t         | t         
|                   | f               |           | delete       
|        0 | t    | f         |           | active |           
|             0 |        0 | t                  | f                 | 
t               | 00:00:00 | 00:00:00 | f           |           0 
|          |        0 | 2012-03-09 16:59:49.569907-05

Finally, db:
    name   |  dbtype  |       dbhost        | dbport | dbname | dbuser  
| dbpass |     dbconn      | dbservice | pgpass | status | 
server_side_prepares | makedelta |             cdate
----------+----------+---------------------+--------+--------+---------+--------+-----------------+-----------+--------+--------+----------------------+-----------+-------------------------------
  ecs      | postgres |                     |        | utvrr  | bucardo 
|        |                 |           |        | active | 
t                    | f         | 2012-03-09 16:52:32.581565-05
  internet | postgres | xxxxxxx.xxxxxxx.com | 5432   | utvrr  | bucardo 
|        | sslmode=require |           |        | active | 
t                    | f         | 2012-03-09 16:52:44.533349-05

Any thoughts on this appreciated.

Chris.