[Bucardo-general] Newbie Questions

Mon Sep 17 22:17:57 UTC 2012

On Sep 13, 2012, at 3:15 PM, Greg Sabino Mullane <greg at endpoint.com> wrote:

>> What are these programs and what are their responsibilities?
>> Master Control Program (MCP)
> 
> The main program. Receives requests and forks off controllers (CTL) to 
> handle them. Also may fork vacuum (VAC) processes. Does no other 
> direct sync work on its own. Responsible for communication with 
> the outside world. Reads the bucardo database on startup.

So it's the thing that's LISTENing?

Since reads the DB on startup, does that mean I have to kick it if I add new tables to be replicated?

>> Controller
> 
> In charge of running a single sync (one named replication set). Will 
> create one or more kids (KID) to do the actual work. In charge of 
> corralling the kids and reporting things back to the MCP. May be 
> short-lived or persistent (configuratble).

Oh? Which is preferable, and why? I would assume that persistent would be preferable in a high transaction environment, less so in a low transaction environment. And by "transactions" I mean "replicable events".

>> Kids
> 
> In charge of doing the actual work. Usually created with a specific 
> mandate, such as "replicate from A to B for sync X". Only talks to 
> the CTL that created it. Also configurable as to whether it exits 
> after doing its work, or hangs around waiting for another job 
> from the controller.

Again, I assume the choice should be determined by the volume of replication events.

> Bucardo 4 (B4) had multiple kids per controller (for multiple targets). 
> Bucardo 5 (B5) does things a little different so currently it is one 
> kid per controller. That may change. :)

Hrm. Sounds like maybe it no longer needs to be a separate process, eh?

>> VAC (Not mentioned by `stop` command docs in the man page)
> 
> Internal vacuum process that is primarily reponsible for keeping 
> the bucardo_delta_* tables trimmed. Replaces the cronjobs needed 
> in B4.

Nice.

>> How many connections will Bucardo use?
> 
> To what? For each database to be replicated (that has at least an 
> active sync, the MCP will connect). The CTL will also connect - 
> one per sync. Each KID will connect as well. VAC processes may 
> also connect from time to time. So a rough guideline is:
> 
> number of connections to target A = 2 + (syncs_using_A * 2)

Yeah, so I can see where connection pooling would be useful.

>> Is there a glossary for these terms?
> 
> I don't think so, other than the help system of 'bucardo'. A 
> quick rundown:

Thanks, very useful. I've added a Wiki page with this stuff:

  http://bucardo.org/wiki/Bucardo/Documentation/Glossary

Note that I've added "goat".

>> Are there Linux start scripts to keep this thing running in production?
> 
> There is a bucardo.rc file inside scripts/. Other than that, it's just 
> `bucardo start` and go. Bucardo is tough to kill completely: it will try 
> hard to resurrect itself.

Great, thanks.

>> Has anyone designed a cross-data center multi master 
>> replication configuration?
> 
> I'm sure, but I don't know if there is anything public and generic out there.

Doesn't seem like there necessarily needs to be much different.

>> What should such a thing look like?
> 
> Tell us your parameters! :)

Two masters, one in Seattle and one in Portland. Our main requirement it so to be able to do maintenance in one data center without the other one needing to go down. It's okay if replication is paused during maintenance, as long as they sync back up quickly when it's done. Relatively small databases (up to around 50G to start) with relatively little transaction volume (a few thousand inserts, updates, and deletes per day).

>> Can I have multiple Bucardos running, or is it a single point of failure?
> 
> It is generally a single point of failure, although it is very easy to keep 
> a standby Bucardo around as the actual 'bucardo' database is very small 
> and does not change often.

I guess I can have Bucardo replicate its own database, eh? That would keep things pretty simple.

>> Is there some sort of failover support in case Bucardo dies?
> 
> Nothing built in. You can use LifeKeeper or something similar. As most people 
> run Bucardo on one of the servers involved in the replication, losing just 
> Bucardo is a rare event - there are usually bigger problems. But as stated 
> above, even a warm standby of the bucardo database would go a long way.

Yeah, that's about what I figured. Failover is much simpler with multiple masters than it is with a hot standby, for example. Just update DNS and go.

> Yeah, you can bring Bucardo up and down at will. It simply means the list 
> of deltas (changed rows) will build up until the next time a Bucardo 
> can get in there and replicate them. With B4, large build ups of deltas 
> without Bucardo running was a serious bottleneck, but B5 is much more 
> efficient, so it is not as big a problem. Still, you probably don't want to 
> go *too* long without pusing over your changes.

Sure. But if there isn't too much transaction volume, a downtime of a few hours or even a day wouldn't be a big deal, I assume.

How stable is 5.0 if I were to target it? Do you have a release date in mind? Seems like it has been close for a long time…

Thanks,

David