[Bucardo-general] Re-syncing a single table in sync group

Thu Dec 12 18:26:58 UTC 2019

> On Dec 12, 2019, at 11:17 AM, Jeff Ross <rossj at cargotel.com> wrote:
> 
> Hi all,
> 
> I've had bucardo up and running as a master-master now for a couple of months.  Recently I noticed that one of the tables on the initial target database has only about 110,000 rows compared to the initial source database where that table has almost 2 million.  In looking over my setup script I discovered that in the the initial pg_dump of data only from initial source to target that table was accidentally not included.
> 
> I currently have 2 syncs running and bucardo status shows them both to be Good.  Additionally, bucardo validate sync shows both syncs to be valid.
> 
> What would be the best way to get the rows from the initial source table to the target table?  I don't see a way to force the re-sync using bucardo.  I can disable the triggers on the target table, dump the data and insert and then re-enable triggers, as the initial data load did using the session replication role.  Or maybe the thing to do it to split that table out into a new sync and then do the initial seeding?

Well, there are a couple of options to refresh a table in Bucardo; as with many things, there is an engineering tradeoff to be considered:

1) straightforward; something like this will repopulate the table “foo”:

$ pg_dump -h master --data-only -t foo | psql -1 -c 'set session_replication_role = replica' -c 'truncate table foo' -f -

In particular, this sets session_replication_role to replica to prevent trigger firing, truncates said table, then loads everything from the current state of the table w/pg_dump, all within a single transaction, so it’s safe to run without ending up in an indeterminate state.

Upsides: easy, don’t need to modify the triggers on any table config, etc.  Any changes to the table made while this process is ongoing will later be replicated once this starts up again.

Downsides, will likely block replication activity for the duration of the COPY, as once the table takes a lock from truncating, no additional writes will be allowed, so any additional changes made to the table will be blocked until this operation completes.  This will likely also block the replication of any other objects in this sync.

2) (As you suggested) drop the table from the sync, then re-add in a new sync.  (It’s probably possible to add it to an existing sync and just onetimecopy=2, so maybe look into this too.)

Upside: prevents blocking of the rest of the cluster while this single table is created

Downsides: if the table has FKs also in bucardo, etc, you’re not guaranteed to be in a state where everything is valid; i.e., if one of the syncs stops, but the one that has the referenced data keeps going then you have orphaned rows.

3) Update the table with an effective noop; something like: UPDATE foo SET col = col.  This will add the delta rows to the table, then bucardo will sync itself as if a bunch of changes have been made.

Upsides: all contained in SQL, will have the desired effect eventually.

Downsides: generates a bunch of db writes, network traffic, etc, as 2 million rows are updated, delta rows are written, and bucardo copies all of them.

HTH,

David
--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david at endpoint.com
785-727-1171

--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david at endpoint.com
785-727-1171

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://bucardo.org/pipermail/bucardo-general/attachments/20191212/19cdbfc0/attachment.sig>