[Bucardo-general] Bucardo weird errors on buzy system

Wed Feb 25 18:21:28 UTC 2015

On Fri, Feb 20, 2015 at 05:27:32PM +0100, sym39 wrote:
> Hello all,
> 
> I am using bucardo 5.1.2 to replicate multiple groups of tables
> using multimaster mode - currently 2 machines (32 cores, 16 GB), but
> may be more in the future to make the application more scalable (lot
> of pg clients).
> 
> A group of table is related to a specific event and consists of 10
> tables and sequences. Two of those tables are updated frequently,
> approximatively 20 new lines and 20 or less lines deleted each
> second.
> 
> The main application creates a fixed number of events, not more than
> 100 at the early stage of the application; when a new event is
> created, an external program creates the corresponding schema in all
> machines and calls bucardo to create a sync for those new tables and
> sequences.
> 
> bucardo sync are created like this :
> bucardo add sync sync_xxx db_group=dbgroup_xxx relgroup=relgroup_xxx
> conflict_strategy=bucardo_latest autokick=1

If the tables are that busy, you may want to set autokick=0 and rely 
one some other means to start the sync, such as setting it to run 
every 10 seconds. This will reduce the overhead of having a trigger 
on each table.

> 1) How scalable is bucardo : In other words, is there a sync limit
> in bucardo, that could make a solution with lot of syncs not
> scalable? For example, are syncs independent or not / dependent of a
> process that controls all syncs, ie, if one process is blocked, does
> it have impacts on others?

They are *mostly* independent. As long as there are no tables shared among 
the syncs, things should run fairly smoothly. There is no sync limit 
per se, but eventually you may see problems with the sheer number of 
notices and perhaps KID processes.

> 3) Is there a way to make the syncs more responsive? for example,
> the kid can be created with options "checktime", "lifetime",
> "maxkicks", "overdue" or "expired", but I am not sure to understand
> the benefits of those options.

checktime controls how often to check if anything has changed. Combined 
with turning autokick off, it can increase performance on busy syncs. 
lifetime and maxkicks control how often a sync is restarted with fresh 
CTL and KID processes. This is not normally necessary, but can help 
if you see a memory leak. overdue and expired are used for tying in 
with tracking tools such as nagios and do not affect Bucardo in any way.

> 4) I notice there are global options to control the bucardo children
> processes - for example 'ctl_checkonkids_time'. Can it help to
> restart erroneous processes more quickly ?

In theory, yes, but it is already quite low, and Bucardo should be able 
to restart itself without such checks. If you see it not doing so, or 
you think it takes too long to do so, mail me the Bucardo logs and 
I can see what is going on.

> Last question, may be related or not : I notice that some sync
> sometimes become inactive. After that, I find no way to make then
> work again, using bucardo activate XXX does not solve, and stopping
> / restarting the daemon does not help, and nothing special is
> present in the logs to explain what is wrong. So why a sync can
> become inactive, why and what to do in this case?

If a database involved in the sync is inactive, the sync will go 
inactive as well. The logs should be giving some clue as to what 
is going on. Try bumping the log_level up (perhaps to DEBUG) and 
see if that gives you a better message.

-- 
Greg Sabino Mullane greg at endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: Digital signature
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20150225/d90d1a54/attachment.sig>