MySQL Forums
Forum List  »  Performance

subquery vs. join?
Posted by: daofeng luo
Date: December 03, 2004 02:52AM

My job is to analyze the logined users of a homepage. I need to do a join (or subquery) between log and ip table, but neither of them is efficent enough. Does anybody could help me to tune this query?

Here are the details:

create table log(
uid int ,
login_ip int unsigned,
login_time datetime,
index index_login_ip (login_ip),
index index_uid (uid);
);
log table is about 4 billion rows.

create table ip_region(
start_ip int unsigned,
end_ip int unsigned,
speed_type ENUM('dial', 'DSL', 'ISDN',"LAN'...)
index index_start_ip (start_ip),
index index_end_ip(end_ip)
)
ip_region table is about 120 thousand rows.

I would like to see the speed_type distribution of the logined users. Two solustions:

select speed_type, count(*) from log, ip_region
where login_ip between start_ip and end_ip
group by speed_type;
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
| 1 | SIMPLE | ip_region | ALL | NULL | NULL | NULL | NULL | 120142 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log | ALL | index_login_ip | NULL | NULL | NULL | 4021986877 | Range checked for each record (index map: 0x2) |
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
when the log table is less than 20 million rows, the query takes 10 minutes, but when the log table become 100 million rows, it takes endless time to finish(I only waited for 15 hours).

I notice that the ip_regions is non-intersected (that is, an ip belongs to only one ip_region), so I write the subquery as follows:
select speed_type, count(*)
from log, ip_region
where start_ip = (select start_ip from ip_region where end_ip>=login_ip limit 1)
and flogin_ip>=start_ip
group by speed_type;
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
| 1 | PRIMARY | log | index | index_login_ip | index_login_ip | 4 | NULL | 4021986877 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | ip_region | ref | index_start_ip | index_start_ip | 4 | func | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | ip_region | ALL | index_end_ip | NULL | NULL | NULL | 120142 | Using where |
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
It will be good if the subquery can utilize the index_end_ip, but it does not. Even I use "force index (index_end_ip) in the subquery, it still refuses to use the index.

Please help me to tune this query.

Options: ReplyQuote


Subject
Views
Written By
Posted
subquery vs. join?
20617
December 03, 2004 02:52AM
7201
December 05, 2004 03:03AM
6017
December 05, 2004 07:52PM
6252
January 20, 2005 03:48AM
5376
January 20, 2005 06:52AM
4735
January 21, 2005 11:43PM
4925
January 21, 2005 11:38PM
3939
January 20, 2005 04:41PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.