[Bucardo-general] Automatic Cascading Replication
David E. Wheeler
david at justatheory.com
Tue Dec 18 23:42:13 UTC 2012
On Dec 13, 2012, at 7:48 PM, Greg Sabino Mullane <greg at endpoint.com> wrote:
> Just a heads up, there is a lot of conversation at times on
> the #bucardo IRC channel on freenode, so make sure you are
> visiting there for some interesting (but low volume) traffic!
Yeah, and I’m slow to respond here. :-)
> I don't think this will work out. If we did that, we'd have some problems:
> * Our main 'get deltas' query would no longer work, but would have to be a
> handful of queries, per database. Additionally, it would add a lot of
> complexity to push_rows. Because each DB might demand a different select
> of distinct rows from bucardo_delta, we would also have to scan the
> source database(s) multiple times, worse case scenario, once per
> target DB. Ugh.
> * Since the data should already be in a logically self-contained set,
> it should never be the case that we save anything by letting some other
> sync do the work for us, because it's either going to violate our
> existing assumptions of which tables should get replicated as a group,
> or be the same as us in which case we don't win anything.
> * If we really want to avoid over-copying, there are probably better
> approaches, but I am not convinced this is much of a problem, as
> makedelta is 99% of the time used to replicate in a non-overlapping
> target fashion.
Well, let’s look at the use-case I have, then.
We’re setting up a multi-master configuration between two servers A and B (in different data centers) for our customer information. There are 10 or so tables for this. The multi-master stuff is perfect for this.
However, we have a bunch of other databases used for varying purposes, and that need to access a subset of the data from that system. For example, we have a project, let’s call it “frobisher,” that needs access to the "customers" table. Ideally, I would not change the existing MM config between the two databases, but just add a new sync that just syncs "customers" with both the main databases as sources and “frobisher” as a read-only target: `A:source B:source frobisher:target`.
But this means that every delta in the "customers" table will be replicated between A and B *twice*.
I suppose the solution is to configure the multi-master replication between A and B to exclude "customers", and then to add a separate sync that just targets "customers". But I expect to have a bunch of these kinds of "copy to this read-only target" things going on, so configuration might get to be a bit of a PITA.
I guess the simplest solution, given the current architecture, is to have a separate sync for every table. Kind of annoying, though.
>> As an aside: Ideally, when a table is being replicated with makedelta enabled,
>> the deltas would be created, but the autokick, if there is one, is *not* fired.
>> Any other sync autokicks would be.
> Not sure that this means - do you mean not firing the autokick of the current
> Right now the kid ignores its own messages, but ideally we would train
> the MCP to ignore such notices if they come from one of its (grandkids') PIDs.
> Certainly this is a TODO. We could probably simply have the KID send the
> MCP a quick NOTIFY on startup. Could also go the other way, and have the information
> about what PID did the kick bubble down to the CTL and then the KID, but
> that would be trickier.
>> * Replace "makedelta" with "cascade" and default to "auto"
>> * Let the triggers do the work as much as possible
> I'm okay with the automatic searching, although we would have to be careful to
> check that the sync is active, as well as some other edge cases. However,
> I've been thinking hard about the "always" triggers mentioned mostly
> on channel (the idea being we set delta to 'always' and thus we can simply
> rely on the trigger populating things rather than doing manual inserts
> in the code for makedeltas). I really don't think that will work, as 'always'
> triggers are just too dangerous. I'd much rather put the onus on Bucardo itself
> to emulate the trigger action, rather than violating the principle of least
> surprise by having 'SET session_replication_role = replica' NOT disable the
> Bucardo triggers. Even if we added some trickery to have them not do the inserts
> via a GUC setting or the like, just having them get called adds a lot of
> overhead for bulk loading, one of the main reasons people "turn off" the
> Bucardo delta triggers. If it was 'always', the only way to truly avoid
> them would be to do an ALTER TABLE which we really, really want to avoid
> as that is some heavy locking.
We discussed on #bucardo having triggers exit if session_replication_role is set to "local", and to set it to that for bulk inserts.
> It's a shame that session_replication_role is as limited as it is, as having
> a little more discretion would be nice (e.g. setting a trigger to always fire
> unless session_replication_role X is enabled, but not on normal 'replica').
> The current origin/local settings are too weak and advisory to really matter
> here. Something to perhaps iron out and suggest to -hackers someday.
Yeah, my suggestion to use "local" is a bit of a hack, though, frankly, more transparent if you're reading the code.
More information about the Bucardo-general