This is a re-post of
my original here.
I am very curious to know how to write the following join query for
MySQL so it will perform as well as
PostgreSQL. I never used PostgreSQL before now. I like MySQL and I always use it. However I was disappointed by its performance on this query and I installed PostgreSQL to compare. The purpose of this query is to map ip addresses to countries. On my CentOS 5 machine, MySQL 5.0.45 takes
10 mins 45 seconds, PostgreSQL 8.1.11 takes
0 mins 1.3 seconds. Yes
MySQL is 496 times slower. I'm sure there must be a way to hint MySQL to run faster. Here is the query:
mysql> select range.id_country from address join range on address.address between range.begin_num and range.end_num;
mysql> describe address;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| address | int(10) unsigned | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
mysql> describe range;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| begin_num | int(10) unsigned | NO | PRI | | |
| end_num | int(10) unsigned | YES | UNI | NULL | |
| id_country | tinyint(3) unsigned | YES | MUL | NULL | |
+-------------+---------------------+------+-----+---------+-------+
Both tables are
MyISAM type. Table `address` is 2124 rows (all distinct). Table `range` is 105920 rows (all distinct).
The best answer I found so far is
here. Steinbrink gives a way to write it as a join on subquery that my MySQL finished in
6 min 42 sec (63% the time of the simple join version). That is still 310 times slower than PostgreSQL. Too slow!
Actually there is a small error in the SQL at
that url:
ORDER BY ip_address DESC
should be:
ORDER BY start DESC
Here is MySQL's explanation of the original query:
mysql> explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;
+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | address | ALL | NULL | NULL | NULL | NULL | 2124 | |
| 1 | SIMPLE | range | ALL | PRIMARY,end_num | NULL | NULL | NULL | 105920 | Range checked for each record (index map: 0x7) |
+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+
Here is PostgreSQL's explanation of the original query:
postgresql# explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.72..7061709.90 rows=83990942 width=2)
-> Seq Scan on range (cost=0.00..3316.47 rows=185547 width=18)
-> Bitmap Heap Scan on address (cost=5.72..31.25 rows=453 width=8)
Recheck Cond: ((address.address >= "outer".begin_num) AND (address.address <= "outer".end_num))
-> Bitmap Index Scan on addresses_pkey (cost=0.00..5.72 rows=453 width=0)
Index Cond: ((address.address >= "outer".begin_num) AND (address.address <= "outer".end_num))
Here is MySQL's explanation showing that a simple query can use the index on begin_num in a "range" type query plan:
mysql> explain select id_country from range where 123123123 between begin_num and end_num;
+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | range | range | PRIMARY,end_num | PRIMARY | 4 | NULL | 35 | Using where |
+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+
Others encountered this problem before.
References:
[*]
MySQL Forums :: Optimizer :: How to optimize a range (best answer I found but not good enough)
[*]
MySQL Performance Blog: Optimize IP Range Join /
Daniweb: Optimize IP Range Join - MySQL
[*]
MySQL Performance Blog: Range Optimization
[*]
MySQL :: inner join on range criteria: unable to use index?
[*]
Sergey Petrunia's blog Use of join buffer is now visible in EXPLAIN
[*]
MySQL :: MySQL 5.0 Reference Manual :: 7.2.5 Range Optimization
[*]
MySQL Bugs: #26963: Incorrect query results with CONST join compared to RANGE or ALL
[*]
MySQL Bugs: #9693: bug in trying to join a value on a range (using indexes)