ndb watchdog overslept
Posted by:
d g
Date: February 08, 2017 11:44AM
Hi
I upgraded my mysql-cluster to version 7.5.5. Its a testing system and i'm trying to restore a dump from InnoDB and Myisam table, i replaced the engine via regex with ndb and added a tablespace. after a few table the whole cluster crashes with following error:
ERROR 1297 (HY000) at line 1530: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
the log file from node 1 says :
Free extent: [ m_page_no: 7628 m_file_no: 0 m_page_idx: 1599]
2017-02-08 17:28:21 [ndbd] INFO -- DROP_TAB_REQ: tab: 149, tabLcpStatus: 3
Started thread, index = 64, id = 98724, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98724 to CPU ok
Started thread, index = 65, id = 98725, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98725 to CPU ok
Started thread, index = 66, id = 98726, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98726 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 67, id = 98727, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98727 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 68, id = 98728, type = NdbfsThread
2017-02-08 17:30:53 [ndbd] INFO -- Locked tid = 98728 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 69, id = 98729, type = NdbfsThread
2017-02-08 17:33:17 [ndbd] INFO -- Locked tid = 98729 to CPU ok
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 10: Overslept 1654 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 9: Overslept 1662 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 3: Overslept 1732 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 11: Overslept 1663 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 4: Overslept 1741 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 5: Overslept 1714 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 5: Overslept 1823 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 9: Overslept 2260 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 10: Overslept 2274 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 4: Overslept 1760 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 7: Overslept 2366 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 3: Overslept 1775 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 11: Overslept 1745 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 6: Overslept 2391 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 8: Overslept 1807 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 2: Overslept 2576 ms, expected ~10ms
2017-02-08 17:56:17 [ndbd] WARNING -- thr: 6: Overslept 1711 ms, expected ~10ms
2017-02-08 17:56:17 [ndbd] WARNING -- thr: 4: Overslept 1788 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 11: Overslept 2144 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 3: Overslept 2164 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 2: Overslept 2184 ms, expected ~10ms
2017-02-08 17:56:19 [ndbd] WARNING -- thr: 9: Overslept 1513 ms, expected ~10ms
2017-02-08 18:26:55 [ndbd] INFO -- /export/home2/pb2/build/sb_1-21745070-1483721047.77/rpm/BUILD/mysql-cluster-gpl-7.5.5/mysql-cluster-gpl-7.5.5/storage/ndb/src/kernel/blocks/pgman.cpp
2017-02-08 18:26:55 [ndbd] INFO -- PGMAN (Line: 556) 0x00000004 Check false failed
2017-02-08 18:26:55 [ndbd] INFO -- Error handler restarting system
2017-02-08 18:26:55 [ndbd] INFO -- Error handler shutdown completed - exiting
2017-02-08 18:26:55 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
log from node 2:
Free extent: [ m_page_no: 7628 m_file_no: 0 m_page_idx: 1599]
2017-02-08 17:28:21 [ndbd] INFO -- DROP_TAB_REQ: tab: 149, tabLcpStatus: 3
Started thread, index = 64, id = 98724, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98724 to CPU ok
Started thread, index = 65, id = 98725, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98725 to CPU ok
Started thread, index = 66, id = 98726, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98726 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 67, id = 98727, type = NdbfsThread
2017-02-08 17:30:12 [ndbd] INFO -- Locked tid = 98727 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 68, id = 98728, type = NdbfsThread
2017-02-08 17:30:53 [ndbd] INFO -- Locked tid = 98728 to CPU ok
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 69, id = 98729, type = NdbfsThread
2017-02-08 17:33:17 [ndbd] INFO -- Locked tid = 98729 to CPU ok
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 10: Overslept 1654 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 9: Overslept 1662 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 3: Overslept 1732 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 11: Overslept 1663 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 4: Overslept 1741 ms, expected ~10ms
2017-02-08 17:55:01 [ndbd] WARNING -- thr: 5: Overslept 1714 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 5: Overslept 1823 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 9: Overslept 2260 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 10: Overslept 2274 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 4: Overslept 1760 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 7: Overslept 2366 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 3: Overslept 1775 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 11: Overslept 1745 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 6: Overslept 2391 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 8: Overslept 1807 ms, expected ~10ms
2017-02-08 17:55:13 [ndbd] WARNING -- thr: 2: Overslept 2576 ms, expected ~10ms
2017-02-08 17:56:17 [ndbd] WARNING -- thr: 6: Overslept 1711 ms, expected ~10ms
2017-02-08 17:56:17 [ndbd] WARNING -- thr: 4: Overslept 1788 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 11: Overslept 2144 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 3: Overslept 2164 ms, expected ~10ms
2017-02-08 17:56:18 [ndbd] WARNING -- thr: 2: Overslept 2184 ms, expected ~10ms
2017-02-08 17:56:19 [ndbd] WARNING -- thr: 9: Overslept 1513 ms, expected ~10ms
2017-02-08 18:26:55 [ndbd] INFO -- /export/home2/pb2/build/sb_1-21745070-1483721047.77/rpm/BUILD/mysql-cluster-gpl-7.5.5/mysql-cluster-gpl-7.5.5/storage/ndb/src/kernel/blocks/pgman.cpp
2017-02-08 18:26:55 [ndbd] INFO -- PGMAN (Line: 556) 0x00000004 Check false failed
2017-02-08 18:26:55 [ndbd] INFO -- Error handler restarting system
2017-02-08 18:26:55 [ndbd] INFO -- Error handler shutdown completed - exiting
2017-02-08 18:26:55 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
all data is lost after the crash see:
Node 3: Data usage is 0%(9013 32K pages of total 3200000)
Node 3: Index usage is 0%(5593 8K pages of total 1280320)
Node 4: Data usage is 0%(9013 32K pages of total 3200000)
Node 4: Index usage is 0%(5593 8K pages of total 1280320)
before the crash usage was 10%.
my config :
[ndbd default]
NoOfReplicas=2
DataMemory=100000M
IndexMemory=10000M
CompressedBackup=true
datadir=/var/lib/mysql-cluster
NoOfFragmentLogParts=10
MaxNoOfConcurrentOperations=5000000
MaxNoOfAttributes=100000
NoOfFragmentLogFiles=32
StopOnError=0
ThreadConfig=ldm={count=10,cpubind=0-4,12-16},tc={count=4,cpubind=6-7,18-19},send={count=1,cpubind=8},recv={count=1,cpubind=20},main={count=1,cpubind=9,21},rep={count=1,cpubind=9,21},io={count=1,cpubind=9,21},watchdog={count=1,cpubind=9,21}
[tcp default]
SendBufferMemory=16M
ReceiveBufferMemory=16M
[ndb_mgmd]
NodeId=1
hostname=172.16.17.11
datadir=/var/lib/mysql-cluster
[ndb_mgmd]
NodeId=2
hostname=172.16.17.12
datadir=/var/lib/mysql-cluster
[ndbd]
hostname=172.16.17.1
[ndbd]
hostname=172.16.17.2
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
my.cnf on myqsl nodes:
[mysqld]
ndbcluster
bind-address = 0.0.0.0
max-allowed-packet = 128M
ndb-cluster-connection-pool=4
default_storage_engine=ndbcluster
optimizer_switch=engine_condition_pushdown=on