MySQL Forums
Forum List  »  NDB clusters

Re: ndb watchdog overslept
Posted by: d g
Date: February 11, 2017 08:40PM

Hi Mikael,

At first thank you for youre reply, i tried setting DiskPageBufferMemory to 4000M. After setting these the dump aborted by another error wo says i should increase MaxNoOfConcurrentOperations. I ran serval time before in this error i set in config as a first value 500000 for MaxNoOfConcurrentOperations then 5000000 and after setting DiskPageBuffermemory to 4000M i increased it again to 10000000 and after that i run out of sendbuffer so i increased it too to 64M (as well as recievebuffer) and at least i had to increase TransactionDeadlockDetectionTimeout. Now The Cluster ist stopping again with the following error :

Feb 9 12:40:55 ndbdata01 ndbmtd: thr_no:13 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
Feb 9 12:40:55 ndbdata01 ndbmtd: 2017-02-09 12:40:55 [ndbd] WARNING -- thr: 9: Overslept 7273 ms, expected ~10ms
Feb 9 12:40:55 ndbdata01 ndbmtd: thr_no:13 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
Feb 9 12:40:55 ndbdata01 ndbmtd: thr_no:13 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
Feb 9 12:40:55 ndbdata01 ndbmtd: thr_no:13 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
Feb 9 12:40:55 ndbdata01 ndbmtd: 2017-02-09 12:40:55 [ndbd] WARNING -- thr: 11: Overslept 4415 ms, expected ~10ms
Feb 9 12:40:55 ndbdata01 ndbmtd: thr_no:13 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
Feb 9 12:40:55 ndbdata01 ndbmtd: 2017-02-09 12:40:55 [ndbd] INFO -- /export/home2/pb2/build/sb_1-21745070-1483721047.77/rpm/BUILD/mysql-cluster-gpl-7.5.5/mysql-cluster-gpl-7.5.5/storage/ndb/src/kernel/blocks/pgman.cpp
Feb 9 12:40:55 ndbdata01 ndbmtd: 2017-02-09 12:40:55 [ndbd] INFO -- PGMAN (Line: 556) 0x00000000 Check false failed
Feb 9 12:40:55 ndbdata01 ndbmtd: 2017-02-09 12:40:55 [ndbd] INFO -- Error handler restarting system
Feb 9 12:40:56 ndbdata01 ndbmtd: 2017-02-09 12:40:56 [ndbd] INFO -- Error handler shutdown completed - exiting
Feb 9 12:40:56 ndbdata01 ndbmtd: 2017-02-09 12:40:56 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.


here my modified config:



[ndbd default]
NoOfReplicas=2
DataMemory=90000M
IndexMemory=10000M
DiskPageBufferMemory=4000M
CompressedBackup=true
datadir=/var/lib/mysql-cluster
NoOfFragmentLogParts=10
MaxNoOfConcurrentOperations=10000000
MaxNoOfAttributes=100000
NoOfFragmentLogFiles=32
TimeBetweenLocalCheckpoints=26
TimeBetweenGlobalCheckpoints=10000
MaxDiskWriteSpeed=600M
MinDiskWriteSpeed=200M
MaxDiskWriteSpeedOwnRestart=300M
TransactionDeadlockDetectionTimeout=3000
StopOnError=0
ODirect=1
ThreadConfig=ldm={count=10,cpubind=0-4,12-16},tc={count=4,cpubind=6-7,18-19},send={count=1,cpubind=8},recv={count=1,cpubind=20},main={count=1,cpubind=9,21},rep={count=1,cpubind=9,21},io={count=1,cpubind=9,21},watchdog={count=1,cpubind=9,21}

[tcp default]
SendBufferMemory=64M
ReceiveBufferMemory=64M

[ndb_mgmd]
NodeId=1
hostname=172.16.17.11
datadir=/var/lib/mysql-cluster

[ndb_mgmd]
NodeId=2
hostname=172.16.17.12
datadir=/var/lib/mysql-cluster

[ndbd]
hostname=172.16.17.1

[ndbd]
hostname=172.16.17.2

[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]


it looks like it have something to to with MaxNoOfConcurrentOperations and DiskPageBufferMemory. For testing i also made a downgrade to 7.4.14 but same behavior. My Hardware sould be fast enought 24(12) cores, 128GB Memory, 2500GB Disk space (raid5) 700-800 MB/sec speed. I tested before with 7.4.12 and no problems after i updatted the system to 7.5.5 i noticed this behavior. It seems that onlyone node crashes at this time but if i test with only one node it crashes too with same log entries. Interesting is that the cluser loses again some data after restart not all but the crash happens somewhere between 4 and 6 percent data usage now and after restart data usage is at 2 percent.



errorlog:


Current byte-offset of file-pointer is: 1566


Time: Friday 10 February 2017 - 11:24:56
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /export/home/pb2/build/sb_0-21747926-1483612889.86/rpm/BUILD/mysql-cluster-gpl-7.4.14/mysql-cluster-gpl-7.4.14/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 10187 thr: 6
Version: mysql-5.6.35 ndb-7.4.14
Trace: /var/lib/mysql-cluste
Time: Friday 10 February 2017 - 14:27:42
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /export/home/pb2/build/sb_0-21747926-1483612889.86/rpm/BUILD/mysql-cluster-gpl-7.4.14/mysql-cluster-gpl-7.4.14/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 10487 thr: 11
Version: mysql-5.6.35 ndb-7.4.14
Trace: /var/lib/mysql-clust
Time: Friday 10 February 2017 - 18:14:51
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /export/home/pb2/build/sb_0-21747926-1483612889.86/rpm/BUILD/mysql-cluster-gpl-7.4.14/mysql-cluster-gpl-7.4.14/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 11860 thr: 4
Version: mysql-5.6.35 ndb-7.4.14
Trace: /var/lib/mysql-cluste


Maybe you will see something interesting. I will report a bug will that information. But it would be very helpful if there would be a workaround for this.


Again Thank you for youre time and help.

Regards
Denny

Options: ReplyQuote


Subject
Views
Written By
Posted
542
d g
February 08, 2017 11:44AM
305
February 09, 2017 03:06PM
Re: ndb watchdog overslept
365
d g
February 11, 2017 08:40PM
219
d g
February 14, 2017 06:12AM
266
d g
February 20, 2017 05:17AM
269
February 21, 2017 04:36AM
202
February 21, 2017 05:27AM
241
d g
February 21, 2017 06:04AM
220
February 21, 2017 08:34AM
199
d g
February 21, 2017 10:38AM
200
d g
February 22, 2017 08:38AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.