Decided to move our cluster to 8.x and after what appeared to be a successful rolling restart of the first data node (took 90 minutes, but that's normal since each data node has 180G allocated to it), we were immediately met with it being shut down (4 seconds after it started).
2021-09-21 17:08:26 [ndbd] INFO -- starting
2021-09-21 17:08:26 [ndbd] INFO -- Start phase 101 completed
2021-09-21 17:08:26 [ndbd] INFO -- Phase 101 was used by SUMA to take over responsibility for sending some of the asynchronous change events
2021-09-21 17:08:26 [ndbd] INFO -- Node started
2021-09-21 17:08:26 [ndbd] INFO -- Node 11 has completed its restart
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x8cf4be]
/usr/sbin/ndbmtd(ndb_print_stacktrace()+0x45) [0x877f55]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x20) [0x82a810]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x891ae9]
/usr/sbin/ndbmtd(Dbtc::sendlqhkeyreq(Signal*, unsigned int, Dbtc::CacheRecord*, Dbtc::ApiConnectRecord*)+0x541) [0x69d541]
/usr/sbin/ndbmtd(Dbtc::packLqhkeyreq(Signal*, unsigned int, Ptr<Dbtc::CacheRecord>, Ptr<Dbtc::ApiConnectRecord>)+0x35) [0x6d4055]
/usr/sbin/ndbmtd(Dbtc::attrinfoDihReceivedLab(Signal*, Ptr<Dbtc::CacheRecord>, Ptr<Dbtc::ApiConnectRecord>)+0x135) [0x6d6f55]
/usr/sbin/ndbmtd(Dbtc::execTCKEYREQ(Signal*)+0x8ba) [0x6d909a]
/usr/sbin/ndbmtd() [0x89f898]
/usr/sbin/ndbmtd() [0x8a46ea]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4ed) [0x8aa4fd]
/usr/sbin/ndbmtd() [0x872b36]
/lib64/libpthread.so.0(+0x84f9) [0x7f403f5094f9]
/lib64/libc.so.6(clone+0x3f) [0x7f403de06f2f]
2021-09-21 17:08:30 [ndbd] INFO -- /var/lib/pb2/sb_1-3697723-1625149027.99/rpm/BUILD/mysql-cluster-gpl-8.0.26/mysql-cluster-gpl-8.0.26/storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
2021-09-21 17:08:30 [ndbd] INFO -- DBTC (Line: 4879) 0x00000002 Check refToMain(TBRef) == 0xF7 failed
2021-09-21 17:08:30 [ndbd] INFO -- Error handler shutting down system
2021-09-21 17:08:32 [ndbd] ALERT -- Node 11: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Any insight or what to look at (wish ndbcluster gave better error messages)?
In the meantime, I'm going to get the data node rolled back to 7.6 (can't really just leave it offline).
Thanks