MySQL Forums
Forum List  »  NDB clusters

Re: Wait LCP to ensure durability
Posted by: Thomas Waibel-BGo
Date: November 25, 2017 02:33AM

We have not been able to bring up Node 6 during the week and ran half legged for the whole week.
Undo space kept growing since Local checkpoint 5029 started on 2017-11-21 19:59:49 never finished.
Node 6 crashed on 2017-11-21 22:35.

I executed ALL DUMP 7010,...7011,..7012,...7013 and ...7014 just now
---
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 3: c_lcpState.lcpStatusUpdatedPlace = 21355, cLcpStart = 0
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 3: c_blockCommit = 0, c_blockCommitNo = 11
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 4: c_lcpState.lcpStatusUpdatedPlace = 21355, cLcpStart = 0
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 4: c_blockCommit = 0, c_blockCommitNo = 11
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 5: c_lcpState.lcpStatusUpdatedPlace = 21355, cLcpStart = 0
2017-11-25 09:19:32 [MgmtSrvr] INFO -- Node 5: c_blockCommit = 0, c_blockCommitNo = 11
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_COPY_GCIREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_COPY_TABREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_UPDATE_FRAG_STATEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_DIH_SWITCH_REPLICA_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_EMPTY_LCP_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_GCP_COMMIT_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_GCP_PREPARE_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_GCP_SAVEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_SUB_GCP_COMPLETE_REP_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_INCL_NODEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_MASTER_GCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_MASTER_LCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_START_INFOREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_START_RECREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_STOP_ME_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_TC_CLOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 3: c_TCGETOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_COPY_GCIREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_COPY_TABREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_UPDATE_FRAG_STATEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_DIH_SWITCH_REPLICA_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_EMPTY_LCP_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_GCP_COMMIT_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_GCP_PREPARE_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_GCP_SAVEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_SUB_GCP_COMPLETE_REP_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_INCL_NODEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_MASTER_GCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_MASTER_LCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_START_INFOREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_START_RECREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_STOP_ME_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_TC_CLOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 4: c_TCGETOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_COPY_GCIREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_COPY_TABREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_UPDATE_FRAG_STATEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_DIH_SWITCH_REPLICA_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_EMPTY_LCP_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_GCP_COMMIT_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_GCP_PREPARE_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_GCP_SAVEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_SUB_GCP_COMPLETE_REP_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_INCL_NODEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_MASTER_GCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_MASTER_LCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_START_INFOREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_START_RECREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_STOP_ME_REQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_TC_CLOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:19:57 [MgmtSrvr] INFO -- Node 5: c_TCGETOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: ParticipatingDIH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: ParticipatingLQH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=1 0000000000000008]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_From_Master_Received = 0
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: ParticipatingDIH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: ParticipatingLQH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=1 0000000000000008]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 0
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: ParticipatingDIH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: ParticipatingLQH = 0000000000000038
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=1 0000000000000008]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=1 0000000000000008]
2017-11-25 09:20:02 [MgmtSrvr] INFO -- Node 5: m_LCP_COMPLETE_REP_From_Master_Received = 0
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: -- Node 3 LCP STATE --
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: lcpStatus = 10 (update place = 21355)
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: lcpStart = 0 lcpStopGcp = 22946293 keepGci = 0 oldestRestorable = 0
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: immediateLcpStart = 0 masterLcpNodeId = 5
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 0 : status: 9 place: 11080
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 1 : status: 2 place: 18117
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 2 : status: 6 place: 17933
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 3 : status: 5 place: 816
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 4 : status: 0 place: 21736
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 5 : status: 10 place: 21355
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 6 : status: 9 place: 20883
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 7 : status: 2 place: 18117
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 8 : status: 6 place: 17933
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: 9 : status: 5 place: 816
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 3: -- Node 3 LCP STATE --
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: -- Node 4 LCP STATE --
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: lcpStatus = 10 (update place = 21355)
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: lcpStart = 0 lcpStopGcp = 22946293 keepGci = 0 oldestRestorable = 0
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: immediateLcpStart = 0 masterLcpNodeId = 5
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 0 : status: 9 place: 11080
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 1 : status: 2 place: 18117
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 2 : status: 6 place: 17933
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 3 : status: 5 place: 816
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 4 : status: 0 place: 21736
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 5 : status: 10 place: 21355
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 6 : status: 9 place: 20883
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 7 : status: 2 place: 18117
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 8 : status: 6 place: 17933
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: 9 : status: 5 place: 816
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 4: -- Node 4 LCP STATE --
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: -- Node 5 LCP STATE --
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: lcpStatus = 10 (update place = 21355)
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: lcpStart = 0 lcpStopGcp = 22957140 keepGci = 22928313 oldestRestorable = 22937530
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: immediateLcpStart = 1 masterLcpNodeId = 5
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 0 : status: 9 place: 11080
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 1 : status: 8 place: 20348
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 2 : status: 2 place: 18117
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 3 : status: 6 place: 17933
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 4 : status: 5 place: 816
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 5 : status: 5 place: 20195
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 6 : status: 4 place: 20073
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 7 : status: 3 place: 20005
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 8 : status: 7 place: 19989
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: 9 : status: 1 place: 19895
2017-11-25 09:20:08 [MgmtSrvr] INFO -- Node 5: -- Node 5 LCP STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 3: -- Node 3 LCP MASTER TAKE OVER STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 3: c_lcpMasterTakeOverState.state = 0 updatePlace = 23294 failedNodeId = 0
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 3: c_lcpMasterTakeOverState.minTableId = 0 minFragId = 0
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 3: -- Node 3 LCP MASTER TAKE OVER STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 4: -- Node 4 LCP MASTER TAKE OVER STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 4: c_lcpMasterTakeOverState.state = 0 updatePlace = 11537 failedNodeId = 3
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 4: c_lcpMasterTakeOverState.minTableId = 0 minFragId = 0
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 4: -- Node 4 LCP MASTER TAKE OVER STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 5: -- Node 5 LCP MASTER TAKE OVER STATE --
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 5: c_lcpMasterTakeOverState.state = 0 updatePlace = 20366 failedNodeId = 3
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 5: c_lcpMasterTakeOverState.minTableId = 0 minFragId = 0
2017-11-25 09:20:10 [MgmtSrvr] INFO -- Node 5: -- Node 5 LCP MASTER TAKE OVER STATE --


I tried to force a LCP using ALL DUMP 7099, but since LCP 5029 has not finished there is no new LCP.

We took a mysqldump backup since ndb backup used to crash the ndb cluster and are now about to restart all data nodes. We have some tables with ndb_table_no_logging=1 and will loose some data but at least hope to enable LCP by restarting the data nodes.

Options: ReplyQuote


Subject
Views
Written By
Posted
661
September 28, 2017 03:46AM
274
September 28, 2017 03:50AM
356
September 28, 2017 03:17PM
417
October 02, 2017 01:56AM
295
October 02, 2017 06:39AM
260
October 04, 2017 03:49AM
259
November 22, 2017 01:07AM
232
November 22, 2017 01:35AM
212
November 22, 2017 02:23AM
261
November 22, 2017 03:30AM
Re: Wait LCP to ensure durability
222
November 25, 2017 02:33AM
283
November 25, 2017 07:33AM
229
November 27, 2017 04:28AM
203
November 27, 2017 08:44AM
210
November 27, 2017 04:16PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.