Bucardo replication
Postgres masters, other slaves
Josh Williams
End Point Corporation
Bucardo
- Stand-alone replication system
- Push from Postgres to other databases
- Trigger-based, asynchronous
- Master-slave{1,}, master-master
- B5: Master{1,} - slave{0,}
- Master-master-master-slave-slave
State of Bucardo
- Started at Backcountry.com in 2002. Postgres 7.2!
- Bucardo released publically as version 3.0.6 in 2007
- Bucardo version 4.0.0 released in 2009
- Bucardo 4 is the current release; little new development
- Bucardo 5 is in beta, actively developed last three years
- B5 has LOTS of new features and improvements over B4
- Larger than the v3 to v4 jump
Bucardo strengths
- Low requirements (Postgres 8.3, Perl, DBD::Pg, DBIx::Safe, pl/perlu)
- No changes to Postgres or its configuration
- Only one daemon, can be run from anywhere
- Fast. Handles bad networks well.
- Good command line and monitoring
- East to install and setup
Bucardo strengths
- Targets (slaves) are not locked
- Very configurable
- Customcode
- Multiple target types (MongoDB, Oracle, etc.)
- Have we mentioned multi-master?
Bucardo limitations
- No DDL
- No failover
- No delta replication without primary keys - fullcopy only
- Targets (slaves) are not locked (still writeable, unlike Slony)
- Daemon will not run on Windows (but DBs can)
- Bloat (partially solved in B5)
Bucardo 5
- Overhauled code. 'Swap' syncs much, much faster. No 'swap' or 'pushdelta'
- Asynchronous calls everywhere
- As many masters as you want. Push to other RDBMSs
- No q table, separate delta/track tables = less bloat problems, faster queries
- Controllable log levels in the bucardo log file
- Better command-line, better monitoring
- Uses new Postgres stuff: application_name, unlogged tables
Bucardo use cases
- Distribute read load (i.e. scaling)
- Redundant master servers
- Data warehousing
- Data reporting
- Upgrading Postgres versions (8.1 to 9.1)
- Joining the NoSQL craze (e.g. a MongoDB target)
- Migrating from other RDBMSs (e.g. if one app still needs Oracle)
Installing the software
- May be in your packaging system, but beware old versions
- git clone git://bucardo.org/bucardo.git; cd bucardo
- OR: Download tarball from bucardo.org; untar and cd bucardo
- perl Makefile.PL; make; sudo make install
Installing the daemon
- $ bucardo install
- $ bucardo show all
Adding databases
- $ bucardo add db PG1 db=sales
- $ bucardo add db ORA1 dbtype=oracle conn="sid=o" dbuser=sysdba
- $ bucardo add db PG2 db=sales dbhost=example.com
- $ bucardo add db PG3 db=sales_dw dbport=6543
Adding tables
- $ bucardo add tables
- $ bucardo add tables pkonly
- If not database is given, picks the 'best' source
- $ bucardo add tables pkonly db=PG2 herd=h1
- $ bucardo add table public.inventory product
- $ bucardo add all sequences
dbgroups, herds, and syncs
- dbgroup: named group of databases, with assigned roles
- $ bucardo add dbgroup dbg1 PG1:source PG2:source PG3:target ORA1:target
- $ bucardo add dbgroup dbg1 PG1 PG2:source PG3 ORA1
- herd: named group of tables and sequences
- $ bucardo add herd h1 inventory product coupons
- sync: named replication event
Adding syncs
- $ bucardo add sync alpha herd=h1 dbs=PG1,PG2:source,PG3,ORA1
- $ bucardo validate alpha
- pl/perlu: source gets schema, functions, triggers, tables, indexes
Starting it up
- $ bucardo start
- tail -f log.bucardo
- $ bucardo status
- $ bucardo status alpha
- $ bucardo kick alpha
$ bucardo status alpha
======================================================================
Last good : Sep 10, 2011 16:16:14 (time to run: 1s)
Rows deleted/inserted : 26 / 26
Last bad : Aug 26, 2011 08:44:19 (time until fail: 1s)
Sync name : alpha
Current state : Good
Source herd/database : h1 / PG1
Tables in sync : 7
Status : active
Check time : none
Overdue time : 00:00:00
Expired time : 00:00:00
Stayalive/Kidsalive : yes / yes
Rebuild index : No
Ping : yes
Onetimecopy : No
Post-copy analyze : Yes
Last error: : Failed : DBD::Pg::db pg_result failed: ERROR: relation "public.regions" does not exist
LINE 1: /* Bucardo 5.0.0 */DELETE FROM public.regions WHERE name = A...
^ at /usr/local/lib/perl5/site_perl/5.10.0/Bucardo.pm line 7482. Line: 3787 Main DB state: ? Error: none DB PG1 state: ? Error: none DB PG2 state: 42P01 Error: 7 (KID 29510)
======================================================================
Recap
- $ bucardo install
- $ bucardo add db PG1 db=sales
- $ bucardo add db PG2 db=sales dbhost=example.com
- $ bucardo add tables pkonly herd=h1
- $ bucardo add sync alpha herd=h1 dbs=PG1,PG2
- $ bucardo start
Sync kicks
- Trigger (statement-level on all tables in the sync)
- Timeout (you pick the time)
- Manual kick (command line, via cron, from another script, even customcode)
How it works
- Each table involved in a sync has two triggers (source only, targets untouched)
- One trigger adds the primary key that has changed to a 'delta' table
- The other trigger signals bucardo that something has changed ('kicks' the sync)
How it works
- Bucardo queries all the deltas to build a list of affected rows
- If more than one source, figure out which one will be authoritative
- For all pkeys that were changed, delete on all other databases, then COPY over
- No need to care about if the original was an insert, update, or delete
- Yes, truncate is handled too (but differently)
How it works
- All sources are polled
- If conflicts, we pick a winner
- Push from winning source for each row to all other dbs
- Triggers, rules, foreign keys disabled
Multimaster in real life
- Replication happens quick. Less chance of conflicts
- Staggered sequences (e.g. unique across all servers)
- Default conflict handler usually fine
customname
- Target table names can be changed
- Globally, per-sync, or per-database
- $ bucardo add customname sales q4sales db=ora1
- $ bucardo add customname sales internal.salez sync=abc
- Source cannot be renamed
customcols
- Target columns names can be changed!
- Globally, per-sync, or per-database
- Use a SELECT clause - rename, remove, or invent columns
- $ bucardo add customcols sales "SELECT id, now() AS ctime, 42 AS answer" db=ora1
- Will check that the target has the columns the SELECT us outputting
- Must still have the primary key, but all else is fair game
- A must for key/value targets!
About that DDL...
- So triggers on system tables
- Very easy: indexes, rules, triggers
- Easy: new columns, changed column types, dropped columns
- Hard: changing primary keys
Flatfiles, aka log shipping
- $ bucardo add db foo type=flatfile
- Unique timestamped SQL dumps - DELETE + COPY
- Ship via slow methods
- Audit trail
Exceptions
- Things Can Go Wrong
- Perl to the rescue
- Your routine's duty: fix it all up, ask to try again
Conflicts
- Multi-master conflict = same primary key changed on two or more servers
- Built-in method: latest wins (all txns across all databases)
- Custom methods: you decide, but there can be only one winning database
- Not needed the great majority of the time!
Customcode
- Exceptions and conflict handlers are two examples
- Can also run before the sync, before the copy, after copy, after sync
- After-sync is handy for cache invalidation
- Before-sync can even tell the sync not to run: like an advanced trigger
Future Sources
- Oracle - straightforward
- MySQL - ugly but doable
- Everything else - big unknown
Current Targets
- Postgres
- Flatfile (log shipping)
- MongoDB
- SQLite
- MySQL
- Oracle
- Drizzle
Future Targets
- Redis, memcached, others
- DB2, Firebird, others
- By popular request?
Contact
- http://bucardo.org
- Mailing lists: -general, -announce, -commit
- Freenode: #bucardo
- Bugzilla: http://bucardo.org/bugzilla/