[Bucardo-general] Newbie Questions

Thu Sep 13 22:15:21 UTC 2012

> What are these programs and what are their responsibilities?
> Master Control Program (MCP)

The main program. Receives requests and forks off controllers (CTL) to 
handle them. Also may fork vacuum (VAC) processes. Does no other 
direct sync work on its own. Responsible for communication with 
the outside world. Reads the bucardo database on startup.

> Controller

In charge of running a single sync (one named replication set). Will 
create one or more kids (KID) to do the actual work. In charge of 
corralling the kids and reporting things back to the MCP. May be 
short-lived or persistent (configuratble).

> Kids

In charge of doing the actual work. Usually created with a specific 
mandate, such as "replicate from A to B for sync X". Only talks to 
the CTL that created it. Also configurable as to whether it exits 
after doing its work, or hangs around waiting for another job 
from the controller.

Bucardo 4 (B4) had multiple kids per controller (for multiple targets). 
Bucardo 5 (B5) does things a little different so currently it is one 
kid per controller. That may change. :)

> VAC (Not mentioned by `stop` command docs in the man page)

Internal vacuum process that is primarily reponsible for keeping 
the bucardo_delta_* tables trimmed. Replaces the cronjobs needed 
in B4.

> How many connections will Bucardo use?

To what? For each database to be replicated (that has at least an 
active sync, the MCP will connect). The CTL will also connect - 
one per sync. Each KID will connect as well. VAC processes may 
also connect from time to time. So a rough guideline is:

number of connections to target A = 2 + (syncs_using_A * 2)

> Can Bucardo connect to a connection pooler, e.g., pgBouncer?

Yes

> Is there a glossary for these terms?

I don't think so, other than the help system of 'bucardo'. A 
quick rundown:

>> dbs

Named databases, with connection information

>> dbgroups

Named group of databases. Can contain 0 or more databases

>> tables
>> sequences

Should be obvious. Both are stored internally in the "goat" table.

>> syncs

The main replication unit. A named replication event, containing 
information about what to replicate (e.g. a herd) from where to 
where (dbs and their roles). Also contains lots of other meta 
information.

>> herds

A group of tables and/or sequences. In other words, a bunch of goats.

>> customnames

Mapping of an original name to a new one. For example, to replicate from 
one schema to a differently named one, or even to a different table name.

>> customcols

Allows an override of what columns to replicate - rather than "SELECT *", 
you can plug in whatever you want here, including adding things not 
in the source column list!

>> customcode

Perl subroutines that can be run at certain points in the sync process. Some 
handle exceptions, some handle conflicts, and some just run at certain times 
with no expectation of functionality (e.g. before we drop triggers).

>> ping

Generally means a sync-level attribute stating whether or not changes to the 
tables in the sync will immediately fire off a NOTIFY. Sometimes this is not 
desired if the tables are very busy (in which case it is usually better to simply 
have the sync check for activity every X seconds), or the sync only needs to run 
at very specific times.

> Are there Linux start scripts to keep this thing running in production?

There is a bucardo.rc file inside scripts/. Other than that, it's just 
`bucardo start` and go. Bucardo is tough to kill completely: it will try 
hard to resurrect itself.

> Has anyone designed a cross-data center multi master 
> replication configuration?

I'm sure, but I don't know if there is anything public and generic out there.

> What should such a thing look like?

Tell us your parameters! :)

> Can I have multiple Bucardos running, or is it a single point of failure?

It is generally a single point of failure, although it is very easy to keep 
a standby Bucardo around as the actual 'bucardo' database is very small 
and does not change often.

> Is there some sort of failover support in case Bucardo dies?

Nothing built in. You can use LifeKeeper or something similar. As most people 
run Bucardo on one of the servers involved in the replication, losing just 
Bucardo is a rare event - there are usually bigger problems. But as stated 
above, even a warm standby of the bucardo database would go a long way.

> Where this is headed: We have two data centers, and want to be able to 
> shut one down (or have it go down) without interfering with the other 
> at all. I imagine that if only one Bucardo can be running to replicate 
> between masters in each data center, then if the box with that Bucardo 
> falls over, we need some way to have another come up ASAP. I figure it's 
> less of an issue if we take stuff down for maintenance, as it will 
> just sync when it comes back up. AMIRITE?

Yeah, you can bring Bucardo up and down at will. It simply means the list 
of deltas (changed rows) will build up until the next time a Bucardo 
can get in there and replicate them. With B4, large build ups of deltas 
without Bucardo running was a serious bottleneck, but B5 is much more 
efficient, so it is not as big a problem. Still, you probably don't want to 
go *too* long without pusing over your changes.

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20120913/77001da8/attachment.sig>