MySQL Forums
Forum List  »  NDB clusters

I help with finding a bug in NDBCluster 7.5.11
Posted by: Nimbi lin
Date: November 13, 2018 01:34AM

I help with finding a bug in NDBCluster 7.5.11 as below:
Node 23: Stall LCP: current stall time: 0 secs, max wait time:11 secs
2018-11-13 15:01:22 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4043 started. Keep GCI = 3283088 oldest restorable GCI = 3283119
2018-11-13 15:03:33 [MgmtSrvr] INFO -- Node 23: LDM(0): Completed LCP, #frags = 1152 #records = 21314442, #bytes = 4108609828
2018-11-13 15:03:33 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4043 completed
2018-11-13 15:03:34 [MgmtSrvr] INFO -- Node 23: Stall LCP, LCP time = 131 secs, wait for Node24, state Synchronize start node with live nodes
2018-11-13 15:03:34 [MgmtSrvr] INFO -- Node 23: Stall LCP: current stall time: 0 secs, max wait time:9 secs
2018-11-13 15:03:43 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4044 started. Keep GCI = 3283166 oldest restorable GCI = 3283149
2018-11-13 15:03:45 [MgmtSrvr] ALERT -- Node 24: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2018-11-13 15:03:45 [MgmtSrvr] ALERT -- Node 23: Node 24 Disconnected
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: Communication to Node 24 closed
2018-11-13 15:03:45 [MgmtSrvr] ALERT -- Node 23: Network partitioning - arbitration required
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: President restarts arbitration thread [state=7]
2018-11-13 15:03:45 [MgmtSrvr] ALERT -- Node 22: Node 24 Disconnected
2018-11-13 15:03:45 [MgmtSrvr] ALERT -- Node 23: Arbitration won - positive reply from node 22
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: NR Status: node=24,OLD=Synchronize start node with live nodes,NEW=Node failed, fail handling on
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: Removed lock for node 24
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: DICT: remove lock by failed node 24 for NodeRestart
2018-11-13 15:03:45 [MgmtSrvr] INFO -- Node 23: DICT: unlocked by node 24 for NodeRestart
2018-11-13 15:03:46 [MgmtSrvr] INFO -- Node 23: Started arbitrator node 22 [ticket=f07b00582279a5e5]
2018-11-13 15:04:14 [MgmtSrvr] WARNING -- Node 23: Failure handling of node 24 has not completed in 29 seconds - state = 6
2018-11-13 15:04:14 [MgmtSrvr] INFO -- Node 23: NF Node 24 tc: 1 lqh: 1 dih: 0 dict: 1 recNODE_FAILREP: 1
2018-11-13 15:04:14 [MgmtSrvr] INFO -- Node 23: m_NF_COMPLETE_REP: [SignalCounter: m_count=1 0000000000800000] m_nodefailSteps: 00000002
2018-11-13 15:04:25 [MgmtSrvr] INFO -- Node 23: NR Status: node=24,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2018-11-13 15:04:25 [MgmtSrvr] INFO -- Node 23: Communication to Node 24 opened
2018-11-13 15:05:46 [MgmtSrvr] INFO -- Node 23: LDM(0): Completed LCP, #frags = 1152 #records = 21314469, #bytes = 4108625512
2018-11-13 15:05:46 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4044 completed
2018-11-13 15:05:46 [MgmtSrvr] INFO -- Node 23: Stall LCP, LCP time = 122 secs, wait for Node24, state Node failure handling complete
2018-11-13 15:05:46 [MgmtSrvr] INFO -- Node 23: Stall LCP: current stall time: 0 secs, max wait time:9 secs
2018-11-13 15:05:55 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4045 started. Keep GCI = 3283235 oldest restorable GCI = 3283237
2018-11-13 15:09:42 [MgmtSrvr] INFO -- Node 23: LDM(0): Completed LCP, #frags = 1152 #records = 21314480, #bytes = 4108632000
2018-11-13 15:09:42 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4045 completed
2018-11-13 15:09:43 [MgmtSrvr] INFO -- Node 23: Stall LCP, LCP time = 226 secs, wait for Node24, state Node failure handling complete
2018-11-13 15:09:43 [MgmtSrvr] INFO -- Node 23: Stall LCP: current stall time: 0 secs, max wait time:16 secs
2018-11-13 15:09:58 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4046 started. Keep GCI = 3283299 oldest restorable GCI = 3283295
2018-11-13 15:13:45 [MgmtSrvr] INFO -- Node 23: LDM(0): Completed LCP, #frags = 1152 #records = 21314520, #bytes = 4108655412
2018-11-13 15:13:45 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4046 completed
2018-11-13 15:13:46 [MgmtSrvr] INFO -- Node 23: Stall LCP, LCP time = 226 secs, wait for Node24, state Node failure handling complete
2018-11-13 15:13:46 [MgmtSrvr] INFO -- Node 23: Stall LCP: current stall time: 0 secs, max wait time:16 secs
2018-11-13 15:14:01 [MgmtSrvr] INFO -- Node 23: Local checkpoint 4047 started. Keep GCI = 3283417 oldest restorable GCI = 3283412


this bug is happen after below steps:
1, I have a 2 data nodes ,2 SQL nodes ndbcluster on Centos 6.8 , node 24's hardisk has few space
2, then I stop node 24 by command :24 stop in ndb_mgm console;
3,and then I use pvcreate and other commands to extent the root file system/'s LVM size.
4, after I have succeded in extend hard disk space, I use ndbd's none intial command option to start, but got an error of: " startphase 5 error 2355: 'Failure to restore schema(Resource configuration error). Permanent error, external action needed'. ",
5, then I use ndbd's initial option to start node 24, but get the error logs as up show.
6, sorry I remember I add the 3 variables in config.ini:
TimeBetweenLocalCheckpoints=10
#not work NoOfFragmentLogFiles=32
#ok MaxNoOfExecutionThreads=6
to solve the error 2355, but forget to restart other data node except management node.


would ndbcluster's pioneer can hurry up to help me to solve?

Oracle&MCluster lover: Georgelin,
Share monthly salary with the person who recommend a big-data relative job to me now,
Personal cross platform website: www.gloCalHelp.com(Official) or glocalhelp.servebeer.com(temp),
Mobile: 0086 180 500 42436 or 156 6865 8383



Edited 1 time(s). Last edit at 11/13/2018 02:18AM by Nimbi lin.

Options: ReplyQuote


Subject
Views
Written By
Posted
I help with finding a bug in NDBCluster 7.5.11
1247
November 13, 2018 01:34AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.