MySQL Forums
Forum List  »  NDB clusters

Cluster failure
Posted by: Sucre Sucre
Date: November 01, 2021 09:10PM

Dear,
I was doing a deletion of about 180 thousand records when got the cluster failure,

Is it because of the lack of hardware resource or I can do some configuration to prevent it from happening again?
Please help, thank you!

Here is the failure log:
###
2021-10-22 00:35:39 [ndbd] WARNING -- Watchdog: Warning overslept 471 ms, expected 100 ms.
2021-10-22 00:35:39 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 376 (ms), exec_time 12 us, sys_time 0 us
2021-10-22 17:10:33 [ndbd] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2021-10-22 17:10:33 [ndbd] INFO -- findNeighbours from: 5950 old (left: 2 right: 2) new (65535 65535)
2021-10-22 17:10:33 [ndbd] ALERT -- Network partitioning - arbitration required
2021-10-22 17:10:33 [ndbd] INFO -- President restarts arbitration thread [state=7]
2021-10-22 17:10:33 [ndbd] ALERT -- Arbitration won - positive reply from node 1
2021-10-22 17:10:33 [ndbd] INFO -- NR Status: node=2,OLD=Initial state,NEW=Node failed, fail handling ongoing
2021-10-22 17:10:33 [ndbd] INFO -- Master takeover started from 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Started failure handling for node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Starting take over of node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION, NF_BLOCK_HANDLE.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: GCP completion 30633610/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
start_resend(0, 2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 2 waiting for NF_TAKEOVER.
empty bucket (30633610/1 30633610/0) -> active
2021-10-22 17:10:33 [ndbd] INFO -- Started arbitrator node 1 [ticket=be200068e2bfb284]
2021-10-22 17:10:33 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart ongoing
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completed take over of failed node 2
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 2 complete.
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completing GCP 30633610/1 on node failure takeover completion.
2021-10-22 17:10:35 [ndbd] INFO -- NR Status: node=2,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2021-10-22 17:10:35 [ndbd] INFO -- Node 2 has completed node fail handling
2021-10-22 17:10:36 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart finished
job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!

job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!

For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
2021-10-22 17:14:11 [ndbd] INFO -- Received signal 6. Running error handler.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(handler_error+0xab) [0x4f04ab]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x80678f]
ndbmtd(handler_error+0x100) [0x4f0500]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
2021-10-22 17:14:11 [ndbd] INFO -- Signal 6 received; Aborted
2021-10-22 17:14:11 [ndbd] INFO -- /export/home2/pb2/build/sb_1-39758149-1592609781.29/rpm/BUILD/mysql-cluster-gpl-8.0.21/mysql-cluster-gpl-8.0.21/storage/ndb/src/kernel/ndbd.cpp
2021-10-22 17:14:11 [ndbd] INFO -- Error handler signal shutting down system
2021-10-22 17:14:11 [ndbd] INFO -- Error handler shutdown completed - exiting
2021-10-22 17:14:11 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2021-10-22 19:22:59 [ndbd] INFO -- Angel pid: 26660 started child: 26661
###

Sincerely,
Sucre



Edited 2 time(s). Last edit at 12/12/2021 08:12PM by Sucre Sucre.

Options: ReplyQuote


Subject
Views
Written By
Posted
Cluster failure
899
November 01, 2021 09:10PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.