[Bucardo-general] Replication isn't working and status all gives a persistent error that doesn't match the state of the replicated databases

David Christensen david at endpoint.com
Thu Feb 8 16:19:05 UTC 2018


> On Feb 8, 2018, at 9:35 AM, Jeff Silverman <jsilverman at blispay.com> wrote:
> 
> Hi, David, thanks for the reply. We were able to resolve this. Turns out the error I posted was a red herring, and had no relevance. Which leads me to a separate question, but I'll describe our resolution, first. I'll post the details for closure's sake.
> 
> So, the problem turned out to be that there were tables that were renamed due to our schema change process. But these changes were not accounted for in our bucardo database, which led to an error. The real issue we struggled with was opaqueness in the way bucardo reports errors.
> 
> The initial hints at this problem were found during the reload, but the reload error didn't have any useful information in it.
> 
>     $ bucardo reload oltpdb_to_olapdw_sync
>     Reloading sync oltpdb_to_olapdw_sync...Reload of sync oltpdb_to_olapdw_sync failed
> 
> bucardo status just said "Good" even though the "Last good" column was many hours old at this point
> 
> Finally stumbled across the error by running `bucardo validate`
> 
> # bucardo validate all
> Validating sync oltpdb_to_olapdw_sync ... WARNING:  Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle dbname=oltpdb;host=oltp01;sslmode=require at line 1018.
> CONTEXT:  PL/Perl function "validate_sync"
> ERROR:  Could not find "mid_transaction_types" inside the "dom_merchant" schema on database "oltpdb"!   # <--- HERE; yes, this schema no longer exists in this database
> CONTEXT:  PL/Perl function "validate_sync" at /usr/local/bin/bucardo line 1266.
> 
> 
> So running `bucardo remove table <tablename>` for all the tables that had been renamed in the master's schema, fixed the problem.
> 
> 
> Which leads to some questions:
> 1) Why is the error reporting so poor here? Is there any way this can be improved?
>    - I tried using the '--verbose' flag when running bucardo commands but that didn't add any extra information
>    - I looked at the bucardo log on disk but it didn't mention the underlying issue

Yes, this could (should?) definitely be improved here; at the very least a suggestion to run “validate” on the sync if we get the “reload failed” message.

> 2) Is there any way to clear the error that persists every time I run `bucardo status all`?
> The error that currently appears is still there, but has no current relevance. That table is gone, and there's no row with that unique id *anywhere* in our oltp database. Also, the error that occurred during `bucardo validate` never appeared anywhere else, so we only figured that out by exhausting all our possibilities.

`bucardo status` actually just returns the latest row from the `syncrun` table.  I’m not sure offhand if we can clear that through the program or not, but I agree that “last error” and “we have no errors” is an important distinction to make.

David
--
David Christensen
End Point Corporation
david at endpoint.com
785-727-1171



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20180208/a3419c4c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20180208/a3419c4c/attachment.sig>


More information about the Bucardo-general mailing list