Skip navigation links

MySQL Forums :: Cluster :: [Data Loss] Cluster nodes have different data


Advanced Search

[Data Loss] Cluster nodes have different data
Posted by: eric ()
Date: October 13, 2005 05:16PM

Hi all,

I'm running MySQL 4.1.12-max on SuSE Linux 9.2. I have a four-node cluster, with three data nodes and a mgm. MGM is node 1, NDB nodes are 2,3, and 4, and there are 9 API node slots (5-13). Today the following sequence of events occurred, based on the log file on the MGM node. (full logs available on request)

MGM node detects NDB node 2 disconnection
MGM node arbitrates and selects NDB node 3 as new master
MGM node detects NDB node 3 disconnection
MGM node arbitrates and selects NDB node 4 as new master
NDB node 4 starts taking over (table fragments scroll by with LcpStatus 3)
NDB node 2 reconnects and starts up, with CM_president = 4, own Node = 2, our dynamic id = 4
NDB node 3 reconnects and begins the startup process
NDB node 2 loads all indexes and completes startup process.
NDB node 3 disconnects again
MGM node arbitrates and NDB node 4 wins the election again
NDB node 3 reconnects, loads indexes, and completes startup.

Now, the problems:

NDB nodes 2 and 4 complain every 1 minute, that "Failure handling of node 3 has not completed in [n] min."
API nodes are allowed to write data via insert/update to the cluster, but some writes are handled on node 3 and others on node 4.

The result is that an API node querying NDB node 4 sees different data than a node querying node 3.

Backups at this point fail to complete, although the logs indicate they were successful. The strange part is that node 4 gets 2 fragments, node 3 gets one, and node 2 doesn't even run the backup (no files generated in /clusterdata/backup, at least)

This occurred on a production system, and we lost an hour's worth of data while I struggled with my disbelief that the cluster actually became desynchronized.

Has anyone seen this kind of behavior before? What could have caused it? How can I prevent it in the future? We're horrified that production data was lost and would sure like to feel good about not having it happen again.

Thanks in advance for any help.

-e

Options: ReplyQuote


Subject Views Written By Posted
[Data Loss] Cluster nodes have different data 946 eric 10/13/2005 05:16PM
Re: [Data Loss] Cluster nodes have different data 564 Stewart Smith 10/13/2005 06:33PM
Re: [Data Loss] Cluster nodes have different data 608 eric 10/14/2005 08:45AM
Re: [Data Loss] Cluster nodes have different data 551 eric 12/06/2005 03:50PM
Re: [Data Loss] Cluster nodes have different data 553 Stewart Smith 12/07/2005 01:50AM
Re: [Data Loss] Cluster nodes have different data 565 eric 01/13/2006 09:10AM
Re: [Data Loss] Cluster nodes have different data 567 Stewart Smith 01/15/2006 05:00PM
Re: [Data Loss] Cluster nodes have different data 526 eric 02/23/2006 04:57PM


Sorry, you can't reply to this topic. It has been closed.