Is NDB7.5.11's generic Linux binary a bad one?
Posted by:
Nimbi lin
Date: November 17, 2018 07:00AM
Dear NDB genius engineers,
Is NDB7.5.11's generic Linux binary a bad one? After I inserted several million rows in 2 NDB tables, I got error logs in datanode 23 as below:
2018-11-17 20:38:17 [ndbd] INFO -- Watchdog: User time: 122 System time: 173628
2018-11-17 20:38:17 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=6071
2018-11-17 20:38:17 [ndbd] INFO -- Watchdog: User time: 122 System time: 173635
2018-11-17 20:38:17 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=6171
2018-11-17 20:38:30 [ndbd] INFO -- Watchdog: User time: 122 System time: 179714
2018-11-17 20:38:30 [ndbd] INFO -- Watchdog: User time: 122 System time: 179721
2018-11-17 20:38:30 [ndbd] WARNING -- Watchdog: Warning overslept 12751 ms, expected 100 ms.
2018-11-17 20:38:30 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=18922
2018-11-17 20:38:30 [ndbd] INFO -- Watchdog: User time: 122 System time: 179721
2018-11-17 20:38:31 [ndbd] INFO -- Received signal 6. Running error handler.
2018-11-17 20:38:31 [ndbd] INFO -- Child process terminated by signal 6
2018-11-17 20:38:31 [ndbd] ALERT -- Node 23: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 6.
in Management node's log is:
2018-11-17 20:33:54 [MgmtSrvr] INFO -- Nodeid 23 allocated for NDB at 192.168.70.13
2018-11-17 20:33:55 [MgmtSrvr] INFO -- Node 22: Node 23 Connected
2018-11-17 20:34:05 [MgmtSrvr] INFO -- Alloc node id 24 failed, no new president yet
2018-11-17 20:34:05 [MgmtSrvr] INFO -- Nodeid 24 allocated for NDB at 192.168.70.14
2018-11-17 20:34:18 [MgmtSrvr] INFO -- Node 22: Node 24 Connected
2018-11-17 20:37:34 [MgmtSrvr] ALERT -- Node 22: Node 23 Disconnected
2018-11-17 20:38:13 [MgmtSrvr] ALERT -- Node 23: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 6.
2018-11-17 20:41:04 [MgmtSrvr] ALERT -- Node 22: Node 24 Disconnected
2018-11-17 20:41:22 [MgmtSrvr] INFO -- Node 22: Node 24 Connected
2018-11-17 20:41:23 [MgmtSrvr] INFO -- Node 24: Communication to Node 23 opened
2018-11-17 20:41:23 [MgmtSrvr] INFO -- Node 24: Waiting 30 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:27 [MgmtSrvr] INFO -- Node 24: Waiting 27 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:30 [MgmtSrvr] INFO -- Node 24: Waiting 24 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:33 [MgmtSrvr] INFO -- Node 24: Waiting 21 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:36 [MgmtSrvr] INFO -- Node 24: Waiting 18 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:39 [MgmtSrvr] INFO -- Node 24: Waiting 15 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:42 [MgmtSrvr] INFO -- Node 24: Waiting 12 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:45 [MgmtSrvr] INFO -- Node 24: Waiting 9 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:48 [MgmtSrvr] INFO -- Node 24: Waiting 6 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
2018-11-17 20:41:51 [MgmtSrvr] INFO -- Node 24: Waiting 3 sec for nodes 23 to connect, nodes [ all: 23 and 24 connected: 24 no-wait: ]
My 2 data nodes 4-node-cluster's config file is as below:
DataDir=/usr/local/mysqlLinJiaXin/ndbdata
#1117 DataMemory=8000M
DataMemory=27212M
#IndexMemory=1000M
IndexMemory=2048M
##BackupMemory: 64M
##ljx新增
#add 1113
#TimeBetweenWatchDogCheck=60000
#TransactionDeadlockDetectionTimeout=5000
#LcpScanProgressTimeout=328
#1110 ChaYiYiBiaoZheng
#TimeBetweenLocalCheckpoints=10
#not work NoOfFragmentLogFiles=32
#ok MaxNoOfExecutionThreads=6
MaxNoOfExecutionThreads=10
DiskPageBufferMemory=160M
BackupDataDir=/usr/local/mysqlLinJiaXin/ndbBack
BackupDataBufferSize=160M
BackupLogBufferSize=32M
BackupMemory=192M
BackupWriteSize=2048K
BackupMaxWriteSize=8M
LockPagesInMainMemory=0
#MHX LockExecuteThreadToCPU=0
#MHX LockMaintThreadsToCPU=1
RealtimeScheduler=1
#1106Change to smaller than 851798
MaxNoOfConcurrentTransactions: 158098
#1106Change to smaller than 8517980
MaxNoOfConcurrentOperations: 180980
SchedulerExecutionTimer=10
SchedulerSpinTimer=100
#CompressedLCP=1
#CompressedBackup=1
#Enabling CompressedLCP and CompressedBackup causes, respectively, local
## Transaction Parameters #
#MaxNoOfConcurrentTransactions: 4096
#MaxNoOfConcurrentOperations: 100000
#1106Change to smaller than 110000
MaxNoOfLocalOperations: 325980
MaxNoOfTables = 1024
MaxNoOfAttributes = 100000
MaxNoOfOrderedIndexes = 10000
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
#ljx新增
#SendBufferMemory=2M
#ReceiveBufferMemory=2M
[NDB_MGMD]
Nodeid=22
#管理节点服务器
HostName=192.168.70.12
PortNumber=8518
# Storage Engines
DataDir=/usr/local/mysqlLinJiaXin/mgmdata
[NDBD]
Nodeid=23
#MySQL集群db1的IP地址
HostName=192.168.70.13
[NDBD]
Nodeid=24
#MySQL集群db2的IP地址
HostName=192.168.70.14
[MYSQLD]
Nodeid=25
HostName=192.168.70.13
[MYSQLD]
Nodeid=26
HostName=192.168.70.14
[MYSQLD]
has your 7.5.12 solved the above problem?
Oracle&MCluster lover: Georgelin,
Share monthly salary with the person who recommend a big-data relative job to me now,
Personal cross platform website: www.gloCalHelp.com(Official) or glocalhelp.servebeer.com(temp),
Mobile: 0086 180 500 42436 or 156 6865 8383
Edited 1 time(s). Last edit at 11/17/2018 07:15AM by Nimbi lin.