Hello,
We're having an issue with both of the data nodes in our cluster connecting successfully then, after a very short time, disconnecting. Checking the log files of these nodes reveals that the 'Ndb kernel thread 0 is stuck' in either 'Job Handling' or we've also had it stuck in 'Performing Send'. I've looked around online and do see that there's been issues with Intel Xeon processors and having NUMA enabled (which we have) however, on another machine with an Intel Xeon and NUMA as well, we have another cluster with a very similar configuration that is working just fine. We currently have two sql + management nodes and two data nodes. The following is from the log file where it first got hung up on 'Job Handling' and then, after trying again, got hung up on 'Performing Send':
https://pastebin.com/KJ4t650q
Our configs are as follows:
sql1/2 my.cnf:
[mysqld]
ndbcluster
ndb-connectstring=sql1,sql2
port=3306
default_storage_engine=ndbcluster
[mysql_cluster]
ndb-connectstring=sql1,sql2
sql1/2 config.ini:
[ndb_mgmd default]
DataDir=/usr/local/mysql/mysql-cluster
[ndb_mgmd]
NodeId=1
HostName=sql1
[ndb_mgmd]
NodeId=2
HostName=sql2
[ndbd default]
NoOfReplicas=2
DataMemory=8192M
IndexMemory=8192M
DataDir=/usr/local/mysql/mysql-cluster
[ndbd]
NodeId=3
HostName=data1
[ndbd]
NodeId=4
HostName=data2
[mysqld]
[mysqld]
data1/2 my.cnf:
[mysqld]
ndbcluster
ndb-connectstring=sql1,sql2
[mysql_cluster]
ndb-connectstring=sql1,sql2
We have tried re-imaging this cluster to no avail with the same issues. Any suggestions would be greatly appreciated!
Edited 1 time(s). Last edit at 01/15/2018 05:31PM by Andrew Fisher.