Re: Mysql NDB cluster - Big import fail
To understand why mysqld loses connection with data nodes we would need to see the data node log files, ndb_*_error.log, ndb_*_out.log, ndb_*_trace.log*.
But I suspect that there are issues with the size of data node process, either killed by linux oom-daemon due to excessive memory usage, or timeouts due to slowed down internal processing caused by swapping.
The config.ini you posted will require close to 35GiB RAM for ndbmtd process and your machine only had 32GiB.
I suspect most of the configuration are done to manage the initial bulk load, rather than manage the "real" workload, and by using any of ways suggested by Jon and Ted the configuration could be simplified and use less memory since typically MaxXXX-parameters will reserve memory proportional to the setting.
With smaller transaction you could probably skip the Operations-settings
MaxNoOfConcurrentOperations = 1629496
MaxDMLOperationsPerTransaction = 1629400
The bulk load uses one transaction at a time, and the default should be enough for most workload, remove:
MaxNoOfConcurrentTransactions = 1629496
With smaller transactions could probably decrease SharedGlobalMemory a lot (maybe even the default of 128MB works but probably ok to set it to 1-2 GB).
SharedGlobalMemory = 10G
Do you need these?
MaxNoOfOrderedIndexes = 1000
MaxNoOfAttributes = 100000
With smaller transaction you may not need to increase tcp buffers either?
#hay un error de Out of SendBufferMemory in sendSignal
If it is ok to remove the above configuration the data node RAM requirements shrinks from 35Gib to 19GiB.
$ ps -opid,fname,rsz,vsz $(pgrep ndbmtd) # posted config.ini
PID COMMAND RSZ VSZ
8668 ndbmtd 3344 22320
8669 ndbmtd 36228996 36578412
$ ps -opid,fname,rsz,vsz $(pgrep ndbmtd) # stripped config.ini
PID COMMAND RSZ VSZ
8774 ndbmtd 3416 22320
8775 ndbmtd 19247136 19553616
Depending on schema and future load I suspect that you may need to increase DataMemory. And as noted by Jon and Ted, IndexMemory is deprecated, instead add that memory to DataMemory.
I guess increasing deadlock timeout also was an attempt to cope with big bulk load transaction, with smaller transaction I guess this can be removed too.
TransactionDeadlockDetectionTimeout = 10000
The default REDO log is 1GB which is probably to small to manage the initial bulk load of 12GB, increasing it to 16 or 32GB should probably be enough. Can you spare 128GB on disk for REDO log your setting will work too.
#temporal para import inicial
NoOfFragmentLogFiles = 1000
FragmentLogFileSize = 32M
Note that a local checkpoint contains whole transactions, and in your case the transactions have been bigger than 1MB and setting TimeBetweenLocalCheckpoints = 2 (4bytes) instead of the default 20 (1MB) will have no effect.
#para evitar REDO log files overloaded error
TimeBetweenLocalCheckpoints = 2
Btw, another way to be able to run dump files with big insert statements is to turn off ndb transactions which allows a multi row insert to be executed in parts. This can either be done within a session
> SET ndb_use_transactions=0;
> source dump.txt;
> SET ndb_use_transactions=1;
Or from command line:
$ mysql --init-command='SET ndb_use_transactions=0' < dump.txt