MySQL Forums
Forum List  »  NDB clusters

Initial Restart of data node stuck in Phase 5
Posted by: In-Kap Bang
Date: March 02, 2019 09:13PM

Hello,

We had to increase the DataMemory/IndexMemory values for our 2-node cluster. After making changes to the Management Node, we performed rolling restart of the data nodes.

First data node has completed the restart, and we performed failover, and we are trying to get the second data node restarted with --initial. However, after 17 hours, it was still stuck in "starting" status. Here is the log from the second data node:

2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T347F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T347F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T348F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T348F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T350F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T350F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T351F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T351F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T353F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T353F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T354F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T354F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T356F0. Changed +15/-801 rows, 46296 bytes. 5440 pct churn to 15 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T356F1. Changed +11/-805 rows, 46136 bytes. 7418 pct churn to 11 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T357F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T357F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T358F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T358F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T361F0. Changed +85/-542 rows, 39532 bytes. 737 pct churn to 85 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T361F1. Changed +85/-542 rows, 39532 bytes. 737 pct churn to 85 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T362F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T362F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T363F0. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2019-03-02 15:51:14 [ndbd] INFO -- LDM(0): Completed copy of fragment T363F1. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
H'0000597a H'0c400600 scanIndex_abort state: 3
H'001b9abb H'03400500 scanIndex_abort state: 3
H'00066461 H'0bf00500 scanIndex_abort state: 3


And here is the output from Management Node:

ndb_mgm> all report memoryusage
Node 2: Data usage is 62%(2214386 32K pages of total 3520000)
Node 2: Index usage is 40%(441634 8K pages of total 1092640)
Node 3: Data usage is 62%(2201054 32K pages of total 3520000)
Node 3: Index usage is 40%(439326 8K pages of total 1092640)

ndb_mgm> 3 report eventlog
2019-03-02 11:52:49 Node 3: Node 1 Connected
2019-03-02 11:53:09 Node 3: Communication to Node 2 opened
2019-03-02 11:53:09 Node 3: Node 2 Connected
2019-03-02 11:53:09 Node 3: Node 2: API mysql-5.6.38 ndb-7.4.17
2019-03-02 11:53:22 Node 3: create object id: 482 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 483 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 484 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 485 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 486 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 492 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 493 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 494 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 495 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 498 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 499 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 502 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 504 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 505 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 506 version: 1 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 521 version: 2 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 522 version: 2 (from 3)
2019-03-02 11:53:22 Node 3: create object id: 523 version: 4 (from 3)
2019-03-02 11:53:29 Node 3: DICT: activate index 521 done (sys/def/206/ix_kase_overdueTime)
2019-03-02 11:53:29 Node 3: DICT: activate index 522 done (sys/def/206/ix_kase_expirationTime)
2019-03-02 11:53:29 Node 3: DICT: activate index 523 done (sys/def/234/ix_KaseNotificationTracker_notificationClassificatio
2019-03-02 11:53:29 Node 3: Restore Database Off-line Starting on node 3
2019-03-02 11:53:29 Node 3: LDM instance 0: Restored LCP : 0 fragments, 0 rows, 0 millis, 0 rows/s
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 476
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 477
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 478
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 479
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 480
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 481
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 482
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 483
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 484
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 485
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 486
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 492
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 493
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 494
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 495
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 498
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 499
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 502
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 504
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 505
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 506
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 521
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 522
2019-03-02 11:53:29 Node 3: instace: 0 rebuild index: 523
2019-03-02 11:53:29 Node 3: Restore Database Off-line Completed on node 3
2019-03-02 11:53:29 Node 3: Bring Database On-line Starting on node 3
2019-03-02 15:51:14 Node 3: Table ID = 354, fragment ID = 1 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 356, fragment ID = 0 have been synced to Node 3 rows: 816 bytes: 600
2019-03-02 15:51:14 Node 3: Table ID = 356, fragment ID = 1 have been synced to Node 3 rows: 816 bytes: 440
2019-03-02 15:51:14 Node 3: Table ID = 357, fragment ID = 0 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 357, fragment ID = 1 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 358, fragment ID = 0 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 358, fragment ID = 1 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 361, fragment ID = 0 have been synced to Node 3 rows: 627 bytes: 4420
2019-03-02 15:51:14 Node 3: Table ID = 361, fragment ID = 1 have been synced to Node 3 rows: 627 bytes: 4420
2019-03-02 15:51:14 Node 3: Table ID = 362, fragment ID = 0 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 362, fragment ID = 1 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 363, fragment ID = 0 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 15:51:14 Node 3: Table ID = 363, fragment ID = 1 have been synced to Node 3 rows: 1 bytes: 2
2019-03-02 20:09:55 Node 3: Trans. Count = 0, Commit Count = 0, Read Count = 0, Simple Read Count = 0, Write Count = 0, AttrInfo Count = 0, Concurrent Operations = 0, Abort Count = 0 Scans = 0 Range scans = 0, Local Read Count = 0 Local Write Count = 0
2019-03-02 20:09:57 Node 3: Mean send size to Node = 2 last 4096 sends = 9447 bytes
2019-03-02 20:10:00 Node 3: Operations=564
2019-03-02 20:10:00 Node 3: Trans. Count = 0, Commit Count = 0, Read Count = 0, Simple Read Count = 0, Write Count = 0, AttrInfo Count = 0, Concurrent Operations = 0, Abort Count = 0 Scans = 0 Range scans = 0, Local Read Count = 0 Local Write Count = 0
2019-03-02 20:10:02 Node 3: 8192 loops,tot 20486063 usec,exec 138 extra:loops = 12,time 22,const 50
2019-03-02 20:10:03 Node 3: Mean loop Counter in doJob last 8192 times = 43
2019-03-02 20:10:05 Node 3: 8192 loops,tot 2786036 usec,exec 247 extra:loops = 3,time 22,const 50
2019-03-02 20:10:05 Node 3: Mean loop Counter in doJob last 8192 times = 63
2019-03-02 20:10:05 Node 3: Operations=1296
2019-03-02 20:10:05 Node 3: Trans. Count = 0, Commit Count = 0, Read Count = 0, Simple Read Count = 0, Write Count = 0, AttrInfo Count = 0, Concurrent Operations = 0, Abort Count = 0 Scans = 0 Range scans = 0, Local Read Count = 0 Local Write Count = 0
2019-03-02 20:10:10 Node 3: Operations=281
2019-03-02 20:10:10 Node 3: Trans. Count = 0, Commit Count = 0, Read Count = 0, Simple Read Count = 0, Write Count = 0, AttrInfo Count = 0, Concurrent Operations = 0, Abort Count = 0 Scans = 0 Range scans = 0, Local Read Count = 0 Local Write Count = 0
2019-03-02 20:10:14 Node 3: 8192 loops,tot 8863236 usec,exec 134 extra:loops = 3,time 32,const 50
2019-03-02 20:10:15 Node 3: Operations=838
2019-03-02 20:10:15 Node 3: Trans. Count = 0, Commit Count = 0, Read Count = 0, Simple Read Count = 0, Write Count = 0, AttrInfo Count = 0, Concurrent Operations = 0, Abort Count = 0 Scans = 0 Range scans = 0, Local Read Count = 0 Local Write Count = 0

ndb_mgm> all status
Node 2: started (mysql-5.6.38 ndb-7.4.17)
Node 3: starting (Last completed phase 4) (mysql-5.6.38 ndb-7.4.17)


I'll be really grateful if anyone can tell me what I can do to recover the second data node.

Thank you in advance.

Options: ReplyQuote


Subject
Views
Written By
Posted
Initial Restart of data node stuck in Phase 5
748
March 02, 2019 09:13PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.