[Bucardo-general] Master

Mon Jul 29 10:01:11 UTC 2013

On Jul 28, 2013, at 4:10 AM, Greg Sabino Mullane <greg at endpoint.com> wrote:

> On Thu, Jul 25, 2013 at 09:25:40PM +0200, David E. Wheeler wrote:
> 
>> * In practice, for relatively static databases (say a couple thousand updates a day), 
>> and using autokick (née ping), how much lag can we expect between the two data centers?
> 
> At that level, the lag is almost always related to the built-in sleeps. Lag is usually 
> around 1 second average, 2 seconds tops unless something else is going on (e.g. a very 
> large changeset, super busy server, pigeon network). You can tweak the sleeps downward 
> to decrease the lag. On the todo is to replace some of those sleeps with proper polling.

I don’t suppose there is a blocking form of pg_notifies()? That seems to me like it would be more efficient. I guess it would still have to poll, though.

>> * If we want to shut down one of the servers for maintenance, we will point all traffic 
>> at the other box. Is there an easy way to tell when all the databases on the node to 
>> be shut down have had all their data synced over? Or would I just have to write a 
>> scrip to check all of the delta table row counts until they are all zero?
> 
> No easy way, but in general the process is stop the apps / repoint traffic, then wait 
> a minute or two. You could check the status of the most busy sync, that's a pretty 
> good indicator. A script sound like a good idea, especially to make sure you manually 
> kick any non-auto syncs you may have forgotten about. ISTR a script someone made once 
> to kick all syncs and make sure they finished okay.

Yeah, if anyone has something like that I could start with, a pointer would be appreciated.

>> In practice, are there any problems with a user writing data and then not seeing it 
>> show up, because one request goes to one server to write, then reads from the other 
>> before the sync finishes? Does this happen occasionally? Rarely? Often?
> 
> No experience with that, IMO if you are doing round-robin masters it really falls 
> back to the app to do the right thing, e.g. write and read from the same db 
> as much as possible. If it becomes a problem, you could train your app to make 
> a db handle "sticky" for X seconds after a write, where X > average_lag_time.

Our network architecture requires no session affinity at all. I will push back a bit, though, and see if the app could perhaps be encouraged to try to use the same database connection over multiple requests from the same session. Probably not, though, since we have many app servers, with requests delivered to them randomly, and a shared-nothing architecture.

Best,

David