MySQL Forums
Forum List  »  NDB clusters

Inner Join significantly slower then subselect on ndbmtd
Posted by: Roman Brunnemann
Date: April 08, 2014 03:06AM

I have two data nodes and found out a strange behaviour:

I have a table raw_data with a column user_id. raw_data has approx. 300.000.000 rows with 417.000 different user_id's.


Now when I run similar queries with very different results. There is a key on "date" which results in 11.900.000 rows for the selection below:

1. Only the raw_data table:

select user_id, count(*)
from raw_data
where date between '2014-03-01' and '2014-03-05'
group by user_id
order by count(*)
limit 100;

Execution time is approx. 15 seconds.

2. Now I want to join the user table in that query:

select user.name, user_id, count(*)
from raw_data rd inner join user u on rd.user_id = u.id
where date between '2014-03-01' and '2014-03-05'
group by user_id
order by count(*)
limit 100;

Execution time is more then 20 minutes.

This is the output of the explain statement:

+----+-------------+-------+--------+---------------------------+---------+---------+------------------------+----------+-------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------+---------+---------+------------------------+----------+-------------------------------------------------------------------------------+
| 1 | SIMPLE | i | range | search,date,user_id_id | date | 3 | NULL | 11905284 | Using where with pushed condition; Using MRR; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | report.d.user_id | 1 | Using where |
+----+-------------+-------+--------+---------------------------+---------+---------+------------------------+----------+-------------------------------------------------------------------------------+

3. I used a subselect from try number 1 to join user table and this was as fast as try number 1:

select u.name, user_id, hits
from (select user_id, count(*) as hits
from raw_data
where date between '2014-03-01' and '2014-03-05'
group by user_id
order by count(*)
limit 100 ) t1
inner join user u on u.id = t1.user_id;

Execution time is the same as in query 1, approx. 15 seconds.

So why is the join that much slower?

Options: ReplyQuote


Subject
Views
Written By
Posted
Inner Join significantly slower then subselect on ndbmtd
1650
April 08, 2014 03:06AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.