MySQL :: Wait LCP to ensure durability

New Topic

Wait LCP to ensure durability

Posted by: Stefan Auweiler
Date: September 28, 2017 03:46AM

We had to restart a node because of HW/Disk failure and now, we can not bring up the node, because it seems to wait for the next LCP...

Running mysql-5.6.37 ndb-7.4.16

Everything seems to start fine, until Phase 5, which doesn't seem to find it's end:

Node 5:

2017-09-28 00:42:20 [ndbd] INFO -- Start phase 4 completed
2017-09-28 00:42:20 [ndbd] INFO -- Phase 4 continued preparations of the REDO log
2017-09-28 00:42:20 [ndbd] INFO -- Request copying of distribution and dictionary information from master Starting
2017-09-28 00:42:22 [ndbd] INFO -- Copying of dictionary information from master Starting
2017-09-28 00:42:24 [ndbd] INFO -- Copying of dictionary information from master Completed
2017-09-28 00:42:24 [ndbd] INFO -- Request copying of distribution and dictionary information from master Completed
2017-09-28 00:42:24 [ndbd] INFO -- NDB start phase 4 completed
2017-09-28 00:42:24 [ndbd] INFO -- Start NDB start phase 5 (only to DBDIH)
2017-09-28 00:42:24 [ndbd] INFO -- Restore Database Off-line Starting
2017-09-28 00:42:24 [ndbd] INFO -- All start fragments sent, requesting LDM to restore all fragments and to execute the REDO log to bring the database to an off-line but consistent state

.... a lot of index creation and data copying ....

2017-09-28 00:42:51 [ndbd] INFO -- LDM(15): Completed copy of fragment T79F233. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(15): Completed copy of fragment T80F225. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(15): Completed copy of fragment T80F233. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T75F41. Changed +30691/-92315 rows, 8238924 bytes. 400 pct churn to 30693 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T76F33. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T76F41. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T77F33. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T77F41. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T78F33. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T78F41. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T79F33. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T79F41. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T80F33. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- LDM(3): Completed copy of fragment T80F41. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2017-09-28 00:42:51 [ndbd] INFO -- Bring Database On-line Completed
2017-09-28 00:42:51 [ndbd] INFO -- Starting REDO logging
2017-09-28 00:42:51 [ndbd] INFO -- Make On-line Database recoverable by waiting for LCP Starting, all parallel threads have now ceased their activity and we have a single wait state here
2017-09-28 03:15:07 [ndbd] INFO -- Adjusting disk write speed bounds due to : Backup started
2017-09-28 03:16:20 [ndbd] INFO -- Backup log buffer report : size 17039360 bytes, hwm 326308 bytes (1 pct)
2017-09-28 03:16:22 [ndbd] INFO -- Adjusting disk write speed bounds due to : Backup completed

this has been the state for about 8 hours now...

on Node 6 (the replica):

2017-09-28 00:20:31 [ndbd] INFO -- Node 5 has completed node fail handling
2017-09-28 00:37:07 [ndbd] INFO -- findNeighbours from: 2248 old (left: 4 right: 8) new (4 8)
2017-09-28 00:42:19 [ndbd] INFO -- Started invalidation of node 5
2017-09-28 00:42:19 [ndbd] INFO -- Completed invalidation of node 5
2017-09-28 00:42:19 [ndbd] INFO -- NR Status: node=5,OLD=Node failure handling complete,NEW=All nodes permitted us
2017-09-28 00:42:24 [ndbd] INFO -- NR Status: node=5,OLD=All nodes permitted us,NEW=Include node in LCP/GCP protocols
2017-09-28 00:42:24 [ndbd] INFO -- NR Status: node=5,OLD=Include node in LCP/GCP protocols,NEW=Synchronize start node with live nodes
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
Ignoring scan.m_state == ScanOp::Blocked, refetch
2017-09-28 00:42:51 [ndbd] INFO -- NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 03:15:06 [ndbd] INFO -- Adjusting disk write speed bounds due to : Backup started
2017-09-28 03:16:19 [ndbd] INFO -- Backup log buffer report : size 17039360 bytes, hwm 283952 bytes (1 pct)
2017-09-28 03:16:19 [ndbd] INFO -- Adjusting disk write speed bounds due to : Backup completed

on Node 1 (the Manager):

2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: Node restart completed copying the fragments to Node 5
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: Bring Database On-line Completed on node 5
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: Starting REDO logging on node 5
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(1): CopyFrag complete. 122 frags, +1158054/-1069371 rows, 355073368 bytes/27292 ms 13010162
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(2): CopyFrag complete. 124 frags, +1159354/-1125594 rows, 358707920 bytes/27292 ms 13143335
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(3): CopyFrag complete. 126 frags, +1170642/-1602995 rows, 387661512 bytes/27292 ms 14204217
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(4): CopyFrag complete. 122 frags, +1151585/-1078742 rows, 352054332 bytes/27292 ms 12899543
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(5): CopyFrag complete. 122 frags, +1153245/-1085182 rows, 353370352 bytes/27292 ms 12947763
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(6): CopyFrag complete. 122 frags, +1160755/-1057196 rows, 355492268 bytes/27292 ms 13025511
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(7): CopyFrag complete. 122 frags, +1158949/-1071535 rows, 355111804 bytes/27292 ms 13011571
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(8): CopyFrag complete. 122 frags, +1139816/-1080159 rows, 346548228 bytes/27292 ms 12697795
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(9): CopyFrag complete. 122 frags, +1142529/-1084242 rows, 348131352 bytes/27292 ms 12755802
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(13): CopyFrag complete. 122 frags, +1154890/-1060333 rows, 352134648 bytes/27292 ms 1290248
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(14): CopyFrag complete. 122 frags, +1134603/-1090741 rows, 345012724 bytes/27291 ms 1264199
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(15): CopyFrag complete. 122 frags, +1149243/-1071930 rows, 350748056 bytes/27291 ms 1285215
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(16): CopyFrag complete. 122 frags, +1135839/-1086801 rows, 345380012 bytes/27291 ms 1265545
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(10): CopyFrag complete. 122 frags, +1157643/-1065288 rows, 354863748 bytes/27292 ms 1300248
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(11): CopyFrag complete. 122 frags, +1133592/-1086868 rows, 344716888 bytes/27292 ms 1263069
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 5: LDM(12): CopyFrag complete. 122 frags, +1146841/-1079050 rows, 350373596 bytes/27292 ms 1283795
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 9: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 7: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 11: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 3: Make On-line Database recoverable by waiting for LCP Starting on node 5, LCP id 551370
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 3: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 12: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 14: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 15: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 16: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 18: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 8: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 4: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 17: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 6: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 10: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2017-09-28 00:42:51 [MgmtSrvr] INFO -- Node 13: NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
....
2017-09-28 00:43:07 [MgmtSrvr] INFO -- Node 5: Data usage is 11%(151129 32K pages of total 1280000)
2017-09-28 00:43:07 [MgmtSrvr] INFO -- Node 5: Index usage is 5%(34511 8K pages of total 640512)

So it seems, everybody is waiting for teh LCP...
We first started the cluster node normally, then, using initial ... both with the same results.

Any idea, how to solve this?

Thanks
Stefan

Edited 1 time(s). Last edit at 09/28/2017 03:51AM by Stefan Auweiler.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Wait LCP to ensure durability

2115

Stefan Auweiler

September 28, 2017 03:46AM

Re: Wait LCP to ensure durability

898

Stefan Auweiler

September 28, 2017 03:50AM

Re: Wait LCP to ensure durability

1063

Mikael Ronström

September 28, 2017 03:17PM

Re: Wait LCP to ensure durability

1707

Daniel Schroers

October 02, 2017 01:56AM

Re: Wait LCP to ensure durability

983

Daniel Schroers

October 02, 2017 06:39AM

Re: Wait LCP to ensure durability

998

Daniel Schroers

October 04, 2017 03:49AM

Re: Wait LCP to ensure durability

1149

Thomas Waibel

November 22, 2017 01:07AM

Re: Wait LCP to ensure durability

972

Thomas Waibel

November 22, 2017 01:35AM

Re: Wait LCP to ensure durability

951

Thomas Waibel

November 22, 2017 02:23AM

Re: Wait LCP to ensure durability

941

Mikael Ronström

November 22, 2017 03:30AM

Re: Wait LCP to ensure durability

897

Thomas Waibel-BGo

November 23, 2017 05:06AM

Re: Wait LCP to ensure durability

920

Thomas Waibel-BGo

November 25, 2017 02:33AM

Re: Wait LCP to ensure durability

997

Thomas Waibel-BGo

November 25, 2017 04:08AM

Re: Wait LCP to ensure durability

1002

Thomas Waibel-BGo

November 25, 2017 04:16AM

Re: Wait LCP to ensure durability

916

Mikael Ronström

November 25, 2017 07:33AM

Re: Wait LCP to ensure durability

804

Thomas Waibel-BGo

November 25, 2017 11:44AM

Re: Wait LCP to ensure durability

841

Thomas Waibel-BGo

November 25, 2017 03:37PM

Re: Wait LCP to ensure durability

804

Thomas Waibel-BGo

November 26, 2017 01:00AM

Re: Wait LCP to ensure durability

1078

Thomas Waibel-BGo

November 26, 2017 08:06AM

Re: Wait LCP to ensure durability

869

Mikael Ronström

November 27, 2017 04:28AM

Re: Wait LCP to ensure durability

895

Thomas Waibel-BGo

November 27, 2017 04:58AM

Re: Wait LCP to ensure durability

821

Mikael Ronström

November 27, 2017 08:44AM

Re: Wait LCP to ensure durability

910

Thomas Waibel-BGo

November 27, 2017 09:16AM

Re: Wait LCP to ensure durability

847

Edwin Desouza

November 27, 2017 04:16PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.