MySQL Forums
Forum List  »  NDB clusters

Error 4009
Posted by: Nisa Apriliyanti
Date: January 05, 2024 09:35PM

Hello,

I am making a project using 8 data nodes, and 47 sql nodes. Here's the config.ini setting in ndb management server:

[MGM]
DataDir=/usr/clustergo
HostName=192.168.1.1

[DB default]
DataDir=/usr/local/mysql/cluster
NoOfReplicas=2

## MEMORY MANAGEMENT
DataMemory=200G
StringMemory=25

## TRANSACTION
MaxNoOfConcurrentTransactions=4000000
MaxNoOfConcurrentOperations=8000000

## TRANSACTION RESOURCE ALLOCATION
TransactionMemory=20G

## LOGGING AND CHECK POINTS
FragmentLogFileSize=32M
NoOfFragmentLogFiles=600
EnableRedoControl=1

## METADATA OBJECTS: it defines pool sizes for metadata objects.
MaxNoOfTables=10240
MaxNoOfOrderedIndexes=2560
MaxNoOfUniqueHashIndexes=2560
MaxNoOfAttributes=1200000
MaxNoOfTriggers=2560
MaxNoOfConcurrentSubOperations=1024

## BOOLEAN PARAMETERS
CompressedLCP=1
LockPagesInMainMemory=1
ODirect=1

## CONTROLLING TIMEOUT, INTERVALS AND DISK PAGING
TimeBetweenInactiveTransactionAbortCheck=4000
TransactionDeadlockDetectionTimeout=12000

## Buffering and logging
RedoBuffer=128M

## CONTROLLING LOG MESSAGE
LogLevelStartup=1
LogLevelShutdown=1
LogLevelCheckpoint=8
LogLevelNodeRestart=15
LogLevelConnection=8
LogLevelError=8
LogLevelCongestion=1
MemReportFrequency=30

## BACKUP CONFIG
BackupMaxWriteSize=10M
BackupDataBufferSize=26M
BackupLogBufferSize=32M
BackupReportFrequency=10

## MULTITHREADING CONFIG
AutomaticThreadConfig=1

## PARAMETER FOR SEND BUFFER MEMORY ALLOCATION

TotalSendBufferMemory=128M

##DISK DATA BUFFERING PARAMETER
DiskPageBufferEntries=100
DiskPageBufferMemory=1024M
SharedGlobalMemory=160M


## NDB CLUSTER REALTIME PERFORMANCE
SpinMethod=DatabaseMachineSpinning
SchedulerExecutionTimer=100

[DB]
HostName=192.168.1.2
LockExecuteThreadToCPU=1
LockMaintThreadsToCPU=0

[DB]
HostName=192.168.1.3
LockExecuteThreadToCPU=1
LockMaintThreadsToCPU=0
.
. ## and so on
.
[API]
NodeId=46

[API]
NodeId=47
HostName=192.168.1.30

###

And whenever there is a large request. It got error 4009, like API (sqlnode) suddenly cannot connected to DB (datanode).

Here's log from ndb_1_cluster.log (ndb management):
2024-01-05 16:50:16 [MgmtSrvr] INFO -- Node 4: Index usage is 0%(20523 32K pages of total 5735985)
2024-01-05 16:50:19 [MgmtSrvr] INFO -- Node 3: Data usage is 12%(816695 32K pages of total 6533144)
2024-01-05 16:50:19 [MgmtSrvr] INFO -- Node 3: Index usage is 0%(20456 32K pages of total 5736905)
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 9: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 9: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 7: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 3: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 6: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 4: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 5: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 2: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 2: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] INFO -- Node 8: Communication to Node 47 closed
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 6: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 7: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 4: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 5: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 8: Node 47 Disconnected
2024-01-05 16:50:30 [MgmtSrvr] ALERT -- Node 3: Node 47 Disconnected
2024-01-05 16:50:31 [MgmtSrvr] INFO -- Node 6: Data usage is 12%(821829 32K pages of total 6532706)
2024-01-05 16:50:31 [MgmtSrvr] INFO -- Node 6: Index usage is 0%(20894 32K pages of total 5731771)
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 6: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 3: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 8: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 5: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 9: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 7: Communication to Node 47 opened
2024-01-05 16:50:33 [MgmtSrvr] INFO -- Node 2: Communication to Node 47 opened
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 4: Communication to Node 47 opened
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 9: Node 47 Connected
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 6: Node 47 Connected
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 8: Node 47 Connected
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 9: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 8: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 6: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 2: Node 47 Connected
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 7: Node 47 Connected
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 7: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:34 [MgmtSrvr] INFO -- Node 2: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 3: Node 47 Connected
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 4: Node 47 Connected
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 5: Node 47 Connected
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 4: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 3: Node 47: API mysql-8.0.35 ndb-8.0.35
2024-01-05 16:50:35 [MgmtSrvr] INFO -- Node 5: Node 47: API mysql-8.0.35 ndb-8.0.35


So my guess there is something wrong with the connection between api and db.
Although they are in the same block IP, and there is no firewall set-up in between.
I have checked that Ubuntu 20 do not have SELinux, and UFW status is inactive. I also do not set-up any IP tables.

and for the apps, we are using web application, which although in different block IP (192.168.2.xx), It able to ping under 1ms. there is no firewall, SElinux, ufw, or iptables as well.

but still for a long query it can takes 4 sec to 4 minutes to complete if the traffic is not that much crowded.

If it is in high time access, users often found error 4009, which makes another error code 1296, 1297, or even mysql has gone away.
I have been working on this for 2 days, and I can't find the cause.
please help, thank you.

Options: ReplyQuote


Subject
Views
Written By
Posted
Error 4009
173
January 05, 2024 09:35PM
81
January 10, 2024 06:05AM
74
January 10, 2024 06:18AM
106
January 10, 2024 06:34AM


Sorry, only registered users may post in this forum.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.