[Bucardo-general] Replication isn't working and status all gives a persistent error that doesn't match the state of the replicated databases

Jeff Silverman jsilverman at blispay.com
Thu Feb 8 15:35:52 UTC 2018

Hi, David, thanks for the reply. We were able to resolve this. Turns out
the error I posted was a red herring, and had no relevance. Which leads me
to a separate question, but I'll describe our resolution, first. I'll post
the details for closure's sake.

So, the problem turned out to be that there were tables that were renamed
due to our schema change process. But these changes were not accounted for
in our bucardo database, which led to an error. The real issue we struggled
with was opaqueness in the way bucardo reports errors.

The initial hints at this problem were found during the reload, but the
reload error didn't have any useful information in it.

    $ bucardo reload oltpdb_to_olapdw_sync
    Reloading sync oltpdb_to_olapdw_sync...Reload of sync
oltpdb_to_olapdw_sync failed

bucardo status just said "Good" even though the "Last good" column was many
hours old at this point

Finally stumbled across the error by running `bucardo validate`

# bucardo validate all
Validating sync oltpdb_to_olapdw_sync ... WARNING:  Issuing rollback() due
to DESTROY without explicit disconnect() of DBD::Pg::db handle
dbname=oltpdb;host=oltp01;sslmode=require at line 1018.
CONTEXT:  PL/Perl function "validate_sync"
ERROR:  Could not find "mid_transaction_types" inside the "dom_merchant"
schema on database "oltpdb"!   # <--- HERE; yes, this schema no longer
exists in this database
CONTEXT:  PL/Perl function "validate_sync" at /usr/local/bin/bucardo line

So running `bucardo remove table <tablename>` for all the tables that had
been renamed in the master's schema, fixed the problem.

Which leads to some questions:
1) Why is the error reporting so poor here? Is there any way this can be
   - I tried using the '--verbose' flag when running bucardo commands but
that didn't add any extra information
   - I looked at the bucardo log on disk but it didn't mention the
underlying issue

2) Is there any way to clear the error that persists every time I run
`bucardo status all`?
The error that currently appears is still there, but has no current
relevance. That table is gone, and there's no row with that unique id
*anywhere* in our oltp database. Also, the error that occurred during
`bucardo validate` never appeared anywhere else, so we only figured that
out by exhausting all our possibilities.

Thanks for your help
