MySQL Forums
Forum List  »  NDB clusters

Ndbd crashed few hours after upgrade, restart hangs in phase 5
Posted by: Clause Grégory
Date: December 04, 2006 10:48AM

I upgraded this morning my cluster from 5.0.24a to 5.0.27 by following the rolling restart method (see http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-rolling-restart.html)
Everything went fine for 9 hours but then one datanode crashed, here are the mgm logs :

2006-12-04 14:53:55 [MgmSrvr] INFO -- Node 2: Local checkpoint 1602 started. Keep GCI = 355249 oldest restorable GCI = 355276
2006-12-04 15:00:48 [MgmSrvr] INFO -- Node 2: Local checkpoint 1603 started. Keep GCI = 355443 oldest restorable GCI = 355470
2006-12-04 15:01:19 [MgmSrvr] WARNING -- Node 2: Node 3 missed heartbeat 2
2006-12-04 15:01:20 [MgmSrvr] WARNING -- Node 2: Node 3 missed heartbeat 3
2006-12-04 15:01:21 [MgmSrvr] INFO -- Node 1: Node 3 Connected
2006-12-04 15:01:22 [MgmSrvr] WARNING -- Node 2: Node 3 missed heartbeat 4
2006-12-04 15:01:22 [MgmSrvr] ALERT -- Node 2: Node 3 declared dead due to missed heartbeat
2006-12-04 15:01:22 [MgmSrvr] INFO -- Node 2: Communication to Node 3 closed
2006-12-04 15:01:22 [MgmSrvr] ALERT -- Node 2: Network partitioning - arbitration required
2006-12-04 15:01:22 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=7]
2006-12-04 15:01:22 [MgmSrvr] ALERT -- Node 2: Arbitration won - positive reply from node 1
2006-12-04 15:01:22 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue:
2006-12-04 15:01:22 [MgmSrvr] ALERT -- Node 2: Node 3 Disconnected
2006-12-04 15:01:22 [MgmSrvr] ALERT -- Node 2: Backup 216 started from 1 has been aborted. Error: 1326
2006-12-04 15:01:22 [MgmSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=5c0b00054dc4dab3]
2006-12-04 15:01:36 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a bug).

I tried to start the failed node with initial but it hangs in phase 5. I had a similar problem last week as I wanted to increase MaxNoOfTable attribute : I did a rolling restart of the cluster but the second data node never restarted (waited for 10 hours and finally the cluster crashed).



Config file :


[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=2048M
IndexMemory=512M
LockPagesInMainMemory=Y
MaxNoOfUniqueHashIndexes=1024
MaxNoOfOrderedIndexes=512
MaxNoOfConcurrentOperations=65535
MaxNoOfAttributes=10000
MaxNoOfTables=512


[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]

# Managment Server

[NDB_MGMD]
id=1
HostName=192.168.0.155
DataDir=/usr/local/mysql/mysql-cluster

[NDB_MGMD]
id=10
HostName=192.168.0.152
DataDir=/usr/local/mysql/mysql-cluster

# Storage Engines

[NDBD]
id=2
HostName=192.168.0.151
DataDir=/usr/local/mysql/mysql-cluster

[NDBD]
id=3
HostName=192.168.0.156
DataDir=/usr/local/mysql/mysql-cluster


#API Nodes

[MYSQLD]
id=20
HostName=192.168.0.152

[MYSQLD]
id=21
HostName=192.168.0.155

[MYSQLD]
id=22
HostName=192.168.0.151

[MYSQLD]
id=23
HostName=192.168.0.156

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

NB: Node 22 and 23 are not performing queries.

Could you tell me if I missed something ?
Thanks,

Regards,

Options: ReplyQuote


Subject
Views
Written By
Posted
Ndbd crashed few hours after upgrade, restart hangs in phase 5
1965
December 04, 2006 10:48AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.