Re: Analysis: INSERT ON DUPLICATE UPDATE VS UPDATE
Hello,
Something that is really bad for NDB internal batching is INSERTS with dependencies, like "INSERT IGNORE" or "INSERT ON DUPLICATE KEY".
Using "INSERT IGNORE" or "INSERT ON DUPLICATE KEY" you break batching in NDB Cluster, all ROWS are inserted one-by-one and not in bathes this is why NDB is slow in this case.
The solution is not handle this in the application code, fetch all the rows you want to insert then run a set of UPDATE/INSERT statements.
Simple test with "INSERT IGNORE":
1) Create simple test table
CREATE TABLE subscribers (
email VARCHAR(50) NOT NULL PRIMARY KEY
) engine=ndbcluster;
2) Generate some test data, only values part:
for i in {1..1000}; do echo "('kalle$i@gmail.com'),"; done > 1000-VALUES.sql
for i in {1..500}; do echo "('kalle$i@gmail.com'),"; done > 500-VALUES.sql
3) Run some tests:
Insert 500 values in one INSERT:
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-500.sql
real 0m0.024s
user 0m0.006s
sys 0m0.002s
Insert 1000 values using INSERT IGNORE (500 duplicates)
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-IGNORE-1000.sql
real 0m0.695s
user 0m0.005s
sys 0m0.003s
Truncate table and run some more test:
time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted -se "truncate table subscribers"
Insert 1000 values:
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-1000.sql
real 0m0.025s
user 0m0.007s
sys 0m0.001s
So, to summarize:
- Insert of 500 rows 0.024s
- Insert of 1000 rows 0.025s
- Insert ignore (with 500 duplicates) 0.695s ...
As you can see INSERT IGNORE is not optimal, it's much better (quicker) to first read all rows, remove duplicates from insert statement and then run the insert.
Last test, insert ignore with all duplicates:
First insert 1000 rows:
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-1000.sql
real 0m0.025s
user 0m0.007s
sys 0m0.001s
Then insert 1000 rows with INSERT IGNORE (all duplicates)
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-IGNORE-1000.sql
real 0m0.402s
user 0m0.003s
sys 0m0.004s
And again ....
[opc@student1-server1 ~]$ time mysql -uroot -S /tmp/mysql.mycluster.50.sock ted < INSERT-IGNORE-1000.sql
real 0m0.419s
user 0m0.007s
sys 0m0.001s