[Bucardo-general] Newbie Questions
Greg Sabino Mullane
greg at endpoint.com
Thu Sep 13 22:15:21 UTC 2012
> What are these programs and what are their responsibilities?
> Master Control Program (MCP)
The main program. Receives requests and forks off controllers (CTL) to
handle them. Also may fork vacuum (VAC) processes. Does no other
direct sync work on its own. Responsible for communication with
the outside world. Reads the bucardo database on startup.
> Controller
In charge of running a single sync (one named replication set). Will
create one or more kids (KID) to do the actual work. In charge of
corralling the kids and reporting things back to the MCP. May be
short-lived or persistent (configuratble).
> Kids
In charge of doing the actual work. Usually created with a specific
mandate, such as "replicate from A to B for sync X". Only talks to
the CTL that created it. Also configurable as to whether it exits
after doing its work, or hangs around waiting for another job
from the controller.
Bucardo 4 (B4) had multiple kids per controller (for multiple targets).
Bucardo 5 (B5) does things a little different so currently it is one
kid per controller. That may change. :)
> VAC (Not mentioned by `stop` command docs in the man page)
Internal vacuum process that is primarily reponsible for keeping
the bucardo_delta_* tables trimmed. Replaces the cronjobs needed
in B4.
> How many connections will Bucardo use?
To what? For each database to be replicated (that has at least an
active sync, the MCP will connect). The CTL will also connect -
one per sync. Each KID will connect as well. VAC processes may
also connect from time to time. So a rough guideline is:
number of connections to target A = 2 + (syncs_using_A * 2)
> Can Bucardo connect to a connection pooler, e.g., pgBouncer?
Yes
> Is there a glossary for these terms?
I don't think so, other than the help system of 'bucardo'. A
quick rundown:
>> dbs
Named databases, with connection information
>> dbgroups
Named group of databases. Can contain 0 or more databases
>> tables
>> sequences
Should be obvious. Both are stored internally in the "goat" table.
>> syncs
The main replication unit. A named replication event, containing
information about what to replicate (e.g. a herd) from where to
where (dbs and their roles). Also contains lots of other meta
information.
>> herds
A group of tables and/or sequences. In other words, a bunch of goats.
>> customnames
Mapping of an original name to a new one. For example, to replicate from
one schema to a differently named one, or even to a different table name.
>> customcols
Allows an override of what columns to replicate - rather than "SELECT *",
you can plug in whatever you want here, including adding things not
in the source column list!
>> customcode
Perl subroutines that can be run at certain points in the sync process. Some
handle exceptions, some handle conflicts, and some just run at certain times
with no expectation of functionality (e.g. before we drop triggers).
>> ping
Generally means a sync-level attribute stating whether or not changes to the
tables in the sync will immediately fire off a NOTIFY. Sometimes this is not
desired if the tables are very busy (in which case it is usually better to simply
have the sync check for activity every X seconds), or the sync only needs to run
at very specific times.
> Are there Linux start scripts to keep this thing running in production?
There is a bucardo.rc file inside scripts/. Other than that, it's just
`bucardo start` and go. Bucardo is tough to kill completely: it will try
hard to resurrect itself.
> Has anyone designed a cross-data center multi master
> replication configuration?
I'm sure, but I don't know if there is anything public and generic out there.
> What should such a thing look like?
Tell us your parameters! :)
> Can I have multiple Bucardos running, or is it a single point of failure?
It is generally a single point of failure, although it is very easy to keep
a standby Bucardo around as the actual 'bucardo' database is very small
and does not change often.
> Is there some sort of failover support in case Bucardo dies?
Nothing built in. You can use LifeKeeper or something similar. As most people
run Bucardo on one of the servers involved in the replication, losing just
Bucardo is a rare event - there are usually bigger problems. But as stated
above, even a warm standby of the bucardo database would go a long way.
> Where this is headed: We have two data centers, and want to be able to
> shut one down (or have it go down) without interfering with the other
> at all. I imagine that if only one Bucardo can be running to replicate
> between masters in each data center, then if the box with that Bucardo
> falls over, we need some way to have another come up ASAP. I figure it's
> less of an issue if we take stuff down for maintenance, as it will
> just sync when it comes back up. AMIRITE?
Yeah, you can bring Bucardo up and down at will. It simply means the list
of deltas (changed rows) will build up until the next time a Bucardo
can get in there and replicate them. With B4, large build ups of deltas
without Bucardo running was a serious bottleneck, but B5 is much more
efficient, so it is not as big a problem. Still, you probably don't want to
go *too* long without pusing over your changes.
--
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20120913/77001da8/attachment.sig>
More information about the Bucardo-general
mailing list