Error in ndb notes failover
Posted by:
Leo Chan
Date: November 21, 2005 12:56AM
Hi,
I have 3 server that used to the MySQL cluster. The config.ini is as following:
[NDBD DEFAULT]
NoOfReplicas=2 # Number of replicas
DataMemory=350M # How much memory to allocate for data storage
IndexMemory=100M # How much memory to allocate for index storage
# For DataMemory and IndexMemory, we have used the
# default values. Since the "world" database takes up
# only about 500KB, this should be more than enough for
# this example Cluster setup.
# Management process options:
[NDB_MGMD]
hostname=192.168.0.26 # Hostname or IP address of MGM node
datadir=/usr/local/mysql/cluster # Directory for MGM node logfiles
[NDB_MGMD]
hostname=192.168.0.27 # Hostname or IP address of MGM node
datadir=/usr/local/mysql/cluster # Directory for MGM node logfiles
# Options for data node "A":
[NDBD]
# (one [NDBD] section per data node)
hostname=192.168.0.26 # Hostname or IP address
datadir=/usr/local/mysql/data # Directory for this data node's datafiles
# Options for data node "B":
[NDBD]
hostname=192.168.0.27 # Hostname or IP address
datadir=/usr/local/mysql/data # Directory for this data node's datafiles
# SQL node options:
[MYSQLD]
hostname=192.168.0.30 # Hostname or IP address
[MYSQLD]
hostname=192.168.0.27
The print out from ndb_mgm is as following :
################## Initail run ##################
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.0.27:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @192.168.0.26 (Version: 5.0.15, Nodegroup: 0, Master)
id=4 @192.168.0.27 (Version: 5.0.15, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @192.168.0.26 (Version: 5.0.15)
id=2 @192.168.0.27 (Version: 5.0.15)
[mysqld(API)] 2 node(s)
id=5 @192.168.0.30 (Version: 5.0.15)
id=6 @192.168.0.27 (Version: 5.0.15)
###############################################
I think here everything is correct as I insert "1" records in 192.168.0.27 and viewable on 192.168.0.30. I have configured 192.168.0.30 mysqld(API) is point to the ndbd process on 192.168.0.26 and 192.168.0.27 is pointed to the ndbd on 192.168.0.27. I configured that in /etc/my.cnf. Afterwards, I unplugged the cable on one ndb 192.168.0.26, I found I cannot access the table that running as ndbcluster. The mysql client give this error: "(ERROR 1015 (HY000): Can't lock file (errno: 4009))". I ps -elf |grep ndbd on 192.168.0.27, I do not found any ndbd process is running. When initial run, I use ndbd --initial and I can grep those process. The ndb_mgm show following:
################# 192.168.0.26 is unplugged time @14:26 ###########
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.0.27:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from 192.168.0.26)
id=4 (not connected, accepting connect from 192.168.0.27)
[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 192.168.0.26)
id=2 @192.168.0.27 (Version: 5.0.15)
[mysqld(API)] 2 node(s)
id=5 (not connected, accepting connect from 192.168.0.30)
id=6 (not connected, accepting connect from 192.168.0.27)
######################################################
I run ndbd again on 192.168.0.27 and the ndb_mgm show this after few minutes
################# 192.168.0.26 after ndbd and wait few minutes #######
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from 192.168.0.26)
id=4 @192.168.0.27 (Version: 5.0.15, Nodegroup: 0, Master)
[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 192.168.0.26)
id=2 @192.168.0.27 (Version: 5.0.15)
[mysqld(API)] 2 node(s)
id=5 @192.168.0.30 (Version: 5.0.15)
id=6 @192.168.0.27 (Version: 5.0.15)
#######################################################
At this point, I can access the table back on 192.168.0.27 or 30 but need to wait for almost 5 minutes but this is too long for me in a cluster environment as a selling point to my client. I insert 2 records and is fine, so "3" records in table in total.
Finally, I now re-plugged the server 192.168.0.26 that is the ndb node that unplugged before. The ndb_mgm show followings
################# 192.168.0.26 is plugged ###############
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.0.27:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @192.168.0.26 (Version: 5.0.15, Nodegroup: 0, Master)
id=4 @192.168.0.27 (Version: 5.0.15, Nodegroup: 0, Master)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @192.168.0.26 (Version: 5.0.15)
id=2 @192.168.0.27 (Version: 5.0.15)
[mysqld(API)] 2 node(s)
id=5 @192.168.0.30 (Version: 5.0.15)
id=6 @192.168.0.27 (Version: 5.0.15)
########################################################
Afterthat, when I query on 192.168.0.30, it will return 1 records (old image in 26) and I query again it will return 3 records(new image in 27). If I repeating the query, it will give out the image of 26 and 27 simultaneously. However, if i make query to 192.168.0.27, it only return new image, 3 records.
1. Why the ndbd process is disappeared on 192.168.0.27 after 192.168.0.26 is unplugged.
2. The resume time of services on 192.168.0.27 and 30 need more than minutes and quite long the a cluster environment
3. Why after re-plugged 192.168.0.26, 192.168.0.30 can query new image on 27 and old image on 26 simultaneously. This is data concurrption.