[Bucardo-general] Automatic Cascading Replication

David E. Wheeler david at justatheory.com
Tue Dec 18 23:42:13 UTC 2012


On Dec 13, 2012, at 7:48 PM, Greg Sabino Mullane <greg at endpoint.com> wrote:

> Just a heads up, there is a lot of conversation at times on 
> the #bucardo IRC channel on freenode, so make sure you are 
> visiting there for some interesting (but low volume) traffic!

Yeah, and I’m slow to respond here. :-)

> I don't think this will work out. If we did that, we'd have some problems:
> 
> * Our main 'get deltas' query would no longer work, but would have to be a 
> handful of queries, per database. Additionally, it would add a lot of 
> complexity to push_rows. Because each DB might demand a different select 
> of distinct rows from bucardo_delta, we would also have to scan the 
> source database(s) multiple times, worse case scenario, once per 
> target DB. Ugh.
> 
> * Since the data should already be in a logically self-contained set, 
> it should never be the case that we save anything by letting some other 
> sync do the work for us, because it's either going to violate our 
> existing assumptions of which tables should get replicated as a group, 
> or be the same as us in which case we don't win anything.
> 
> * If we really want to avoid over-copying, there are probably better 
> approaches, but I am not convinced this is much of a problem, as 
> makedelta is 99% of the time used to replicate in a non-overlapping 
> target fashion.

Well, let’s look at the use-case I have, then.

We’re setting up a multi-master configuration between two servers A and B (in different data centers) for our customer information. There are 10 or so tables for this. The multi-master stuff is perfect for this.

However, we have a bunch of other databases used for varying purposes, and that need to access a subset of the data from that system. For example, we have a project, let’s call it “frobisher,” that needs access to the "customers" table. Ideally, I would not change the existing MM config between the two databases, but just add a new sync that just syncs "customers" with both the main databases as sources and “frobisher” as a read-only target: `A:source B:source frobisher:target`.

But this means that every delta in the "customers" table will be replicated between A and B *twice*.

I suppose the solution is to configure the multi-master replication between A and B to exclude "customers", and then to add a separate sync that just targets "customers". But I expect to have a bunch of these kinds of "copy to this read-only target" things going on, so configuration might get to be a bit of a PITA.

I guess the simplest solution, given the current architecture, is to have a separate sync for every table. Kind of annoying, though.

>> As an aside: Ideally, when a table is being replicated with makedelta enabled, 
>> the deltas would be created, but the autokick, if there is one, is *not* fired. 
>> Any other sync autokicks would be.
> 
> Not sure that this means - do you mean not firing the autokick of the current 
> sync?

Yes.

> Right now the kid ignores its own messages, but ideally we would train 
> the MCP to ignore such notices if they come from one of its (grandkids') PIDs.
> Certainly this is a TODO. We could probably simply have the KID send the 
> MCP a quick NOTIFY on startup. Could also go the other way, and have the information 
> about what PID did the kick bubble down to the CTL and then the KID, but 
> that would be trickier.

Added: https://github.com/bucardo/bucardo/issues/34

>> * Replace "makedelta" with "cascade" and default to "auto"
>> * Let the triggers do the work as much as possible
> 
> I'm okay with the automatic searching, although we would have to be careful to 
> check that the sync is active, as well as some other edge cases. However, 
> I've been thinking hard about the "always" triggers mentioned mostly 
> on channel (the idea being we set delta to 'always' and thus we can simply 
> rely on the trigger populating things rather than doing manual inserts 
> in the code for makedeltas). I really don't think that will work, as 'always' 
> triggers are just too dangerous. I'd much rather put the onus on Bucardo itself 
> to emulate the trigger action, rather than violating the principle of least 
> surprise by having 'SET session_replication_role = replica' NOT disable the 
> Bucardo triggers. Even if we added some trickery to have them not do the inserts 
> via a GUC setting or the like, just having them get called adds a lot of 
> overhead for bulk loading, one of the main reasons people "turn off" the 
> Bucardo delta triggers. If it was 'always', the only way to truly avoid 
> them would be to do an ALTER TABLE which we really, really want to avoid 
> as that is some heavy locking.

We discussed on #bucardo having triggers exit if session_replication_role is set to "local", and to set it to that for bulk inserts.

> It's a shame that session_replication_role is as limited as it is, as having 
> a little more discretion would be nice (e.g. setting a trigger to always fire 
> unless session_replication_role X is enabled, but not on normal 'replica'). 
> The current origin/local settings are too weak and advisory to really matter 
> here. Something to perhaps iron out and suggest to -hackers someday.

Yeah, my suggestion to use "local" is a bit of a hack, though, frankly, more transparent if you're reading the code.

Best,

David




More information about the Bucardo-general mailing list