[Bucardo-general] Initial population of collections from large PostgreSQL tables

Ali Asad Lotia ali at anobii.com
Fri Apr 27 13:39:48 UTC 2012


Hello All,
I'm currently working on moving some large-ish tables from PostgreSQL to
mongo using the Bucardo 5 beta and having problems with tables with
multiple million rows. The current version of Bucardo is 4.99.4 and the
sync is configured with PostgreSQL 9.1 as a source and MongoDB 2.0.4 as the
target.

The bucardo setup:
[lotia at dbhack bucardo]$ ./bucardo list dbs
Database: anobii  Type: postgres  Status: active  Conn: psql -p  -U bucardo
-d anobii
Database: mongo   Type: mongo     Status: active
[lotia at dbhack bucardo]$ ./bucardo list dbgroups
Database group: tgroup  Members: anobii:source mongo:target
[lotia at dbhack bucardo]$ ./bucardo list sync
Sync: mongotest  Herd: therd DB group tgroup: anobii (source) mongo
(target)  [Active]

A table 5.5 million rows (5585966) that has 6 columns: 4 int, 2 smallint
fails to sync and the following information is recored in the bucardo and
mongodb logs respectively.

The initial sync to mongo fails when I "touch" each row in one of the
aforementioned table in the source database as described at
http://blog.endpoint.com/2011/06/mongodb-replication-from-postgres-using.htmlfails
with the following error reported in the log:
(11481) [Fri Apr 27 12:28:27 2012] KID Warning! Aborting due to exception
for public.author_item:? Error was can't get db response, not connected at
/usr/lib64/perl5/vendor_perl/MongoDB/Collection.pm line 529.
DBI::db=HASH(0x1ce67f0)->disconnect invalidates 1 active statement handle
(either destroy statement handles or call finish on them before
disconnecting) at Bucardo.pm line 2188.

The error reported in the mongo log is:
Fri Apr 27 12:28:26 [conn128] recv(): message len 187968085 is too
large187968085

>From my understanding, the amount of data we are sending to mongo in the
query generated by the sync is too large for MongoDB to handle. If my
understanding is correct, is there a way to get the sync to divide the sync
up into multiple queries to mongo that don't overflow the maximum defined
message length?

Please let me know if I need to provide further information to help
understand my question.

Thank you,
Ali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20120427/9af7764a/attachment.html 


More information about the Bucardo-general mailing list