MySQL Forums
Forum List  »  NDB clusters

Restart Ndb very long ??
Posted by: fab
Date: February 16, 2006 08:00AM

Hello

Structure :

2 ndb servers with Redhat 4 ent , 2 dual core 64 bits with 6 giga of ram and SCSI drive for each server
2 api servers with gentoo k2.6 with 1 dual core 64 bits with 1 giga of ram and SATA Dirve for each api server

My problem :

I have php script with some fork for simulate the application with some typical queries.
During this simulation i kill one NDB for test of crash.
I restart the node with -n and --initial but this node can take 2 days and more to be up and the ndb node do nothing 100% of idle.
If during the time of the restart the other node crash the cluster is completly down.
During the time of the restart i have no lcp in log file.

My questions:
Why the ndb node can take 2 days and more to be up?
Why i don't have any LCP in log file ?
Do you know a solution to have a quick restart and is it normal ?
It is a bug or not ??

If you have a good solution please post a message

TKX

My solution to have a quick restart:
My solution is to stop all ndb nodes but it is a very bad solution.

Config :
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=10 @*.*.*.20 (Version: 5.0.18, Nodegroup: 0, Master)
id=11 @*.*.*.21 (Version: 5.0.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1 @*.*.*.11 (Version: 5.0.18)
id=2 @*.*.*.12 (Version: 5.0.18)

[mysqld(API)] 8 node(s)
id=20 @*.*.*.11 (Version: 5.0.18)
id=21 @*.*.*.12 (Version: 5.0.18)
id=22 (not connected, accepting connect from *.*.*.20)
id=23 (not connected, accepting connect from *.*.*.21)
id=24 (not connected, accepting connect from *.*.*.13)
id=25 (not connected, accepting connect from *.*.*.14)
id=26 (not connected, accepting connect from *.*.*.15)
id=27 (not connected, accepting connect from *.*.*.16)


Config.ini:

NoOfReplicas=2

LockPagesInMainMemory=Y

DataMemory=2000M

IndexMemory=160M

MaxNoOfOrderedIndexes=1024

MaxNoOfUniqueHashIndexes=512

MaxNoOfAttributes=4000

MaxNoOfTables=256

MaxNoOfTriggers=2048

MaxNoOfConcurrentOperations=50000

MaxNoOfOpenFiles=60

#REDO and LCP
NoOfFragmentLogFiles=40
NoOfDiskPagesToDiskAfterRestartTUP=60
NoOfDiskPagesToDiskDuringRestartTUP=120
NoOfDiskPagesToDiskDuringRestartTUP=120
NoOfDiskPagesToDiskAfterRestartACC=6
NoOfDiskPagesToDiskDuringRestartACC=12


TimeBetweenLocalCheckpoints=16

LogLevelError=15
LogLevelInfo=15
LogLevelConnection=15
LogLevelStartUp=15


Log :

2006-02-16 12:03:33 [MgmSrvr] INFO -- Node 10: Fragment 0: noLcpReplicas==0 0(on 11)=0(Idle) 1(on 10)=0(Idle)
2006-02-16 12:03:33 [MgmSrvr] INFO -- Node 10: Fragment 1: noLcpReplicas==0 0(on 11)=0(Idle) 1(on 10)=0(Idle)
2006-02-16 12:03:33 [MgmSrvr] INFO -- Node 10: Started arbitrator node 1 [ticket=0f99000272bdecc7]
2006-02-16 12:03:43 [MgmSrvr] INFO -- Mgmt server state: nodeid 11 reserved for ip *.*.*.21, m_reserved_nodes 0000000000300802.
2006-02-16 12:03:44 [MgmSrvr] INFO -- Node 1: Node 11 Connected
2006-02-16 12:03:44 [MgmSrvr] INFO -- Mgmt server state: nodeid 11 freed, m_reserved_nodes 0000000000300002.
2006-02-16 12:03:44 [MgmSrvr] INFO -- Node 11: Node 2 Connected
2006-02-16 12:03:57 [MgmSrvr] INFO -- Node 10: Communication to Node 11 opened
2006-02-16 12:11:13 [MgmSrvr] INFO -- Node 11: Start initiated (version 5.0.18)
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Start phase 0 completed
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Communication to Node 10 opened
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Node 10 Connected
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 10: Node 11 Connected
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: CM_REGCONF president = 10, own Node = 11, our dynamic id = 3
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 10: Node 11: API version 5.0.18
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Node 10: API version 5.0.18
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Start phase 1 completed
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Receive arbitrator node 1 [ticket=0f99000272bdecc7]
2006-02-16 12:11:15 [MgmSrvr] INFO -- Node 11: Start phase 2 completed (initial node restart)
2006-02-16 12:11:39 [MgmSrvr] INFO -- Node 11: Start phase 3 completed (initial node restart)
2006-02-16 12:11:49 [MgmSrvr] INFO -- Node 11: Start phase 4 completed (initial node restart)

Options: ReplyQuote


Subject
Views
Written By
Posted
Restart Ndb very long ??
1811
fab
February 16, 2006 08:00AM
1135
fab
February 27, 2006 04:17AM
1086
March 05, 2006 09:08PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.