subquery vs. join?
My job is to analyze the logined users of a homepage. I need to do a join (or subquery) between log and ip table, but neither of them is efficent enough. Does anybody could help me to tune this query?
Here are the details:
create table log(
uid int ,
login_ip int unsigned,
login_time datetime,
index index_login_ip (login_ip),
index index_uid (uid);
);
log table is about 4 billion rows.
create table ip_region(
start_ip int unsigned,
end_ip int unsigned,
speed_type ENUM('dial', 'DSL', 'ISDN',"LAN'...)
index index_start_ip (start_ip),
index index_end_ip(end_ip)
)
ip_region table is about 120 thousand rows.
I would like to see the speed_type distribution of the logined users. Two solustions:
select speed_type, count(*) from log, ip_region
where login_ip between start_ip and end_ip
group by speed_type;
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
| 1 | SIMPLE | ip_region | ALL | NULL | NULL | NULL | NULL | 120142 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log | ALL | index_login_ip | NULL | NULL | NULL | 4021986877 | Range checked for each record (index map: 0x2) |
+----+-------------+------------------+------+----------------+------+---------+------+-----------+------------------------------------------------+
when the log table is less than 20 million rows, the query takes 10 minutes, but when the log table become 100 million rows, it takes endless time to finish(I only waited for 15 hours).
I notice that the ip_regions is non-intersected (that is, an ip belongs to only one ip_region), so I write the subquery as follows:
select speed_type, count(*)
from log, ip_region
where start_ip = (select start_ip from ip_region where end_ip>=login_ip limit 1)
and flogin_ip>=start_ip
group by speed_type;
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
| 1 | PRIMARY | log | index | index_login_ip | index_login_ip | 4 | NULL | 4021986877 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | ip_region | ref | index_start_ip | index_start_ip | 4 | func | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | ip_region | ALL | index_end_ip | NULL | NULL | NULL | 120142 | Using where |
+----+--------------------+-----------+-------+----------------+----------------+---------+------+-----------+----------------------------------------------+
It will be good if the subquery can utilize the index_end_ip, but it does not. Even I use "force index (index_end_ip) in the subquery, it still refuses to use the index.
Please help me to tune this query.