MySQL Forums
Forum List  »  NDB clusters

Got temporary error 899 'Rowid already allocated' from NDBCLUSTER
Posted by: Diego Mendoza
Date: May 18, 2017 05:02PM

Hi, I have 2 data nodes with 64GiB of RAM, 16 cores running RHEL 7.2.

DB size is 20 GiB more or less and two of the tables have the most transactionability (4691 records per minute), both reads and writes (updates and inserts).

At some point during heavy load, the app got this error: Got temporary error 899 'Rowid already allocated' from NDBCLUSTER.

Then, I took node 3 down (Check configuration on below) and everything became normal (I did this because the node went downtime a few hours early). The error completelly disappeared and is not happening anymore so far.

HW and SW metrics were between the normal values during the event, so no saturation was noticed at all.

At that time, logs on the data servers were these:
DB03-01
-------------------------------------------------
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=601
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42380
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=701
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42380
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=802
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42380
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=902
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42380
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=1002
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42380
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=1102
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42381
2017-05-17 15:31:49 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=1202
2017-05-17 15:31:49 [ndbd] INFO -- Watchdog: User time: 560428 System time: 42381
2017-05-17 16:40:44 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=100
2017-05-17 16:40:44 [ndbd] INFO -- Watchdog: User time: 837421 System time: 70600
2017-05-17 16:40:44 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 659 (ms)
2017-05-17 16:40:44 [ndbd] INFO -- Watchdog: User time: 837431 System time: 70601
2017-05-17 16:40:44 [ndbd] WARNING -- Watchdog: Warning overslept 708 ms, expected 100 ms.
2017-05-17 17:01:43 [ndbd] INFO -- Watchdog: User time: 921892 System time: 80433
2017-05-17 17:01:43 [ndbd] WARNING -- Watchdog: Warning overslept 511 ms, expected 100 ms.
2017-05-17 17:01:43 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 468 (ms)
2017-05-17 17:01:43 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 666 (ms)
2017-05-17 17:01:43 [ndbd] INFO -- Watchdog: User time: 921897 System time: 80438
2017-05-17 17:01:43 [ndbd] WARNING -- Watchdog: Warning overslept 766 ms, expected 100 ms.
-------------------------------------------------

DB03-02
-------------------------------------------------
2017-05-17 14:22:42 [ndbd] INFO -- granting SumaStartMe dict lock to 3
prepare to handover bucket: 0
14554448/0 (14554447/4294967295) switchover complete bucket 0 state: 2
handover
2017-05-17 14:22:47 [ndbd] INFO -- clearing SumaStartMe dict lock for 3
2017-05-17 14:22:47 [ndbd] INFO -- NR Status: node=3,OLD=Wait handover of subscriptions,NEW=Restart completed
2017-05-17 14:22:47 [ndbd] INFO -- GCP Monitor: Computed max GCP_COMMIT lag to 48 seconds
2017-05-17 14:22:47 [ndbd] INFO -- GCP Monitor: Computed max GCP_SAVE lag to 154 seconds
2017-05-17 14:22:47 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart finished
2017-05-17 15:42:10 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Job Handling elapsed=100
2017-05-17 15:42:10 [ndbd] INFO -- Watchdog: User time: 5470464 System time: 727558
2017-05-17 15:42:10 [ndbd] WARNING -- thr: 2: Overslept 2055 ms, expected ~10ms
2017-05-17 15:42:10 [ndbd] INFO -- Watchdog: User time: 5470469 System time: 727560
2017-05-17 15:42:10 [ndbd] WARNING -- Watchdog: Warning overslept 2352 ms, expected 100 ms.
Backup : Excessive Backup/LCP write rate in last monitoring period - recorded = 13617870 bytes/s,
Current speed is = 10485760 bytes/s
Backup : Monitoring period : 1078 millis. Bytes written : 14680064. Max allowed : 12582912
Actual number of periods in this monitoring interval: 22 calculated number was: 11
-------------------------------------------------

No errors on ndb_3_error.log and ndb_4_out.log

Current running version is this: mysql-5.6.34 ndb-7.4.13

Configuration is this:
-------------------------------------------------
[ndb_mgmd default]
DataDir=/var/lib/mysql-cluster
PortNumber=1296

[ndb_mgmd]
NodeId=1
HostName=APP06-01
LogDestination=FILE:filename=ndb_1_cluster.log,maxsize=10000000,maxfiles=6

[ndb_mgmd]
NodeId=2
HostName=APP06-02
LogDestination=FILE:filename=ndb_2_cluster.log,maxsize=10000000,maxfiles=6

[ndbd default]
NoOfReplicas=2
DataMemory=57344M
IndexMemory=4096M
MaxNoOfOrderedIndexes=8192
MaxNoOfUniqueHashIndexes=8192
MaxNoOfTriggers=196608
MaxNoOfFiredTriggers=196608
MaxNoOfTables=4096
MaxNoOfAttributes=10240
DataDir=/data/mysql-cluster
MaxNoOfConcurrentTransactions=100000
MaxNoOfConcurrentOperations=550000
MaxNoOfLocalOperations=165000
MaxBufferedEpochs=550
TimeBetweenEpochsTimeout=16000
TimeBetweenLocalCheckpoints=1
NoOfFragmentLogFiles=128
FragmentLogFileSize=32M
TransactionDeadlockDetectionTimeout=4500000

[TCP DEFAULT]
SendBufferMemory=128M
ReceiveBufferMemory=128M

[ndbd]
NodeId=3
HostName=DB03-01
ServerPort=50701

[ndbd]
NodeId=4
HostName=DB03-02
ServerPort=50702

[mysqld]
NodeId=5

[mysqld]
NodeId=6
-------------------------------------------------

Any help to troubleshoot this is welcome.

Thanks in advance.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.