[Bucardo-general] Wrong value for one field

Ioana Danes ioanadanes at gmail.com
Mon Aug 8 18:14:06 UTC 2016


Hi,

I am having a weird case where the value for one field (one record only) is
incorrect. It is hard to conclude if it is a replication issue or data
corruption.

Here are the facts:

I am running postgres 9.4.8:

postgresql94-9.4.8-1PGDG.rhel7.x86_64
postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
postgresql94-server-9.4.8-1PGDG.rhel7.x86_64

on CentOS Linux release 7.2.1511 (Core)

This is happening in a production environment but luckily on the reporting
database.
I have a cluster of 3 databases, db1 and db2 are masters and replicate
between each other and also replicate to db3 (db1 <-> db2, db1 -> db3, db2
-> db3).
For replication I am using Bucardo 5.4.1.

The problem I am having is on db3: one record in a table it shows a wrong
value for one single field:

select gameplayid, transactionid, encodedplay from mytable where
transactionid in (75315811, 75315815) order by transactionid;

 gameplayid | transactionid | encodedplay
------------+---------------+--------------
  160019239 |      75315811 | mix:5,2,7
  160019237 |      75315811 | mix:5,4,8
  160019235 |      75315811 | mix:6,2,9
  160019233 |      75315811 | mix:1,9,8
  160019271 |      75315815 | mix:9,0,9
  160019269 |      75315815 | mix:9,8,9
  160019267 |      75315815 | mix:9,2,2
  160019265 |      75315815 | mix:2,2,8
  160019263 |      *75315811* | backup:1,9,1 -- this record should have
transactionid  = *75315815*
  160019261 |      75315815 | backup:2,0,9

select gameplayid, transactionid, encodedplay from mytable where
transactionid in (75315815) order by transactionid;

 gameplayid | transactionid | encodedplay
------------+---------------+--------------
  160019271 |      75315815 | mix:9,0,9
  160019269 |      75315815 | mix:9,8,9
  160019267 |      75315815 | mix:9,2,2
  160019265 |      75315815 | mix:2,2,8
  160019263 |      *75315811* | backup:1,9,1 -- this record should have
transactionid  = *75315815* and it does show as output on this query
  160019261 |      75315815 | backup:2,0,9

select gameplayid, transactionid, encodedplay from mytable where
transactionid in (75315811) order by transactionid;

 gameplayid | transactionid | encodedplay
------------+---------------+--------------
  160019239 |      75315811 | mix:5,2,7
  160019237 |      75315811 | mix:5,4,8
  160019235 |      75315811 | mix:6,2,9
  160019233 |      75315811 | mix:1,9,8

So the record with gameplayid = 160019263 have a wrong transactionid,
75315811 instead of 75315815.
The correct value is 75315815 and that I know because of the following
facts:
- on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
- this table gets mostly inserts, very rare updates and only on other 2
fields not this one.
- there is another parent table that shows the number of records in this
table which is 4 for transactionid =75315811  and 6 for transactionid =
7531581.

This table has an index by transactionid and that index seem correct
because the filtering and the ordering are fine (like the filed has the
correct value)...

What puzzles me is that the value that shows in this field is a real value
from another record...

I only caught this issue because I have a script that runs in the night
that compares the databases ...

By now I updated the field with the correct value and everything seem
stable.

Postgres logs don't have any information about file corruption or any other
kind of error. I also checked other logs on the system and I could not find
any traces of corruption.

Alsoi the bucardo log shows no signs of errors or corruption around that
time.

So I either have a replication issue + index corruption on db3 or a data
corruption on db3...

Any ideas, thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.endcrypt.com/pipermail/bucardo-general/attachments/20160808/1deaf910/attachment.html>


More information about the Bucardo-general mailing list