MySQL Forums
Forum List  »  NDB clusters

Re: Wait LCP to ensure durability
Posted by: Thomas Waibel-BGo
Date: November 25, 2017 11:44AM

Hi Mikael,
thanks for having a look into this on a Saturday! Your response was a little too late - now we are already sweating blood here:

Executed the restart at around 09:39:41. Missed Node 6 there...

Started applying undo logs at around 10:12. We had around 1,3 million pages in the undo space by then to work on...

At 16:03 we got the following error on node 4:
---
2017-11-25 16:03:42 [ndbd] INFO -- LGMAN: Applying Undo log - 827259 pages completed, applied 521100000 records, reached LSN 21410526021
2017-11-25 16:03:43 [ndbd] INFO -- LGMAN: Applying Undo log - 827306 pages completed, applied 521130000 records, reached LSN 21410496021
2017-11-25 16:03:44 [ndbd] INFO -- LGMAN: File undo1.log have wrong pageLSN in page: 32767
2017-11-25 16:03:44 [ndbd] INFO -- Error while reading the datapages and UNDO log
2017-11-25 16:03:44 [ndbd] INFO -- LGMAN (Line: 4426) 0x00000000
2017-11-25 16:03:44 [ndbd] INFO -- Error handler restarting system
2017-11-25 16:03:44 [ndbd] INFO -- Error handler shutdown completed - exiting
2017-11-25 16:04:55 [ndbd] INFO -- Angel detected startup failure, count: 1
2017-11-25 16:04:55 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Occured during startphase 4. Caused by error 2313: 'Error while reading the datapages and UNDO log(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
2017-11-25 16:04:55 [ndbd] INFO -- Ndb has terminated (pid 15499) restarting
2017-11-25 16:04:55 [ndbd] INFO -- Angel reconnected to '10.20.56.5:1186'
2017-11-25 16:04:55 [ndbd] INFO -- Angel reallocated nodeid: 4
2017-11-25 16:04:55 [ndbd] INFO -- Angel pid: 15498 started child: 31734
---

All 4 nodes automatically restarted after the forced node shutdown - at least node 6 is now included in the node group...

We do not know if that wrong pageLSN error was just a hickup or what to do if it happens again...

Restoring the whole system (380 Gbyte compressed mysqldump backup) will be a nightmare...

Options: ReplyQuote


Subject
Views
Written By
Posted
661
September 28, 2017 03:46AM
274
September 28, 2017 03:50AM
356
September 28, 2017 03:17PM
417
October 02, 2017 01:56AM
295
October 02, 2017 06:39AM
260
October 04, 2017 03:49AM
259
November 22, 2017 01:07AM
232
November 22, 2017 01:35AM
212
November 22, 2017 02:23AM
261
November 22, 2017 03:30AM
283
November 25, 2017 07:33AM
Re: Wait LCP to ensure durability
222
November 25, 2017 11:44AM
229
November 27, 2017 04:28AM
203
November 27, 2017 08:44AM
210
November 27, 2017 04:16PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.