Update with dependent subquery needs optimization!
Posted by: Lon B
Date: December 06, 2006 05:23PM
Date: December 06, 2006 05:23PM
Hey all -
These forums have been really helpful from just reading, but now I need some specific help. Any feedback would be greatly appreciated.
I'm in MySQL 4.1.12a, all tables are MyISAM.
There are three tables: interval_span, impression, and ad_unit. The idea is that the impression table has records that need to be counted based primarily on their network_id and ad_unit_type_id (which is retrieved via a join to ad_unit), and secondarily on the view_time being within the applicable date range.
Some table structure details:
Now, to the meat of it... Here's the update query we're attempting:
This is the smallest date range I can set up (interval_span begin/end date ranges are 30 minutes long). This update query returns:
I need to get this time WAY down. Here are some notes I took while testing:
1. Running inner select count query with constant values takes 2.2 sec
2. Running whole update without inner join on ad_unit takes much longer
3. Creating the impression_idx_2 index on impression(view_time, network_id) reduced execution by 10%
Ok, here's some more notes, that hopefully will lend more insight. In order to try to use mysql's explain tool, I re-wrote the query as a select, it runs in almost identical time, so I think I've got it right, but haven't figured out how to optimize it further:
Any insight you could lend would be GREATLY appreciated! I've been working on this for quite some time. Please feel free to publish this note and your response, I've found your responses to others very helpful, I think this could help others as well.
Thanks,
Lon
These forums have been really helpful from just reading, but now I need some specific help. Any feedback would be greatly appreciated.
I'm in MySQL 4.1.12a, all tables are MyISAM.
There are three tables: interval_span, impression, and ad_unit. The idea is that the impression table has records that need to be counted based primarily on their network_id and ad_unit_type_id (which is retrieved via a join to ad_unit), and secondarily on the view_time being within the applicable date range.
Some table structure details:
mysql> desc interval_span; +------------------+----------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------------+----------+------+-----+---------------------+----------------+ | id | int(11) | | PRI | NULL | auto_increment| | ad_unit_type_id | int(11) | | PRI | 0 | | | network_id | int(11) | | MUL | 0 | | | begin_date | datetime | | PRI | 0000-00-00 00:00:00 | | | end_date | datetime | | PRI | 0000-00-00 00:00:00 | | | roadblock_id | int(11) | YES | MUL | NULL | | | expected_traffic | int(11) | YES | | NULL | | | impression_count | int(11) | YES | | NULL | | +------------------+----------+------+-----+---------------------+----------------+ 8 rows in set (0.00 sec) mysql> show index from interval_span; +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | interval_span | 0 | PRIMARY | 1 | id | A | 106396 | NULL | NULL | | BTREE | | | interval_span | 0 | PRIMARY | 2 | ad_unit_type_id | A | 106396 | NULL | NULL | | BTREE | | | interval_span | 0 | PRIMARY | 3 | begin_date | A | 106396 | NULL | NULL | | BTREE | | | interval_span | 0 | PRIMARY | 4 | end_date | A | 106396 | NULL | NULL | | BTREE | | | interval_span | 1 | interval_span_FI_1 | 1 | ad_unit_type_id | A | 13 | NULL | NULL | | BTREE | | | interval_span | 1 | interval_span_FI_2 | 1 | network_id | A | 5 | NULL | NULL | | BTREE | | | interval_span | 1 | interval_span_FI_3 | 1 | roadblock_id | A | 106396 | NULL | NULL | YES | BTREE | | | interval_span | 1 | interval_span_idx_1 | 1 | begin_date | A | 1899 | NULL | NULL | | BTREE | | | interval_span | 1 | interval_span_idx_2 | 1 | end_date | A | 1899 | NULL | NULL | | BTREE | | +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ 9 rows in set (0.00 sec) mysql> select count(*) from interval_span; +----------+ | count(*) | +----------+ | 106396 | +----------+ mysql> desc impression; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | | PRI | NULL | auto_increment | | view_time | datetime | YES | MUL | NULL | | | ad_unit_id | int(11) | | PRI | 0 | | | network_id | int(11) | YES | MUL | NULL | | | remote_addr | varchar(255) | YES | | NULL | | +-------------+--------------+------+-----+---------+----------------+ 5 rows in set (0.00 sec) mysql> show index from impression; +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | impression | 0 | PRIMARY | 1 | id | A | 2842354 | NULL | NULL | | BTREE | | | impression | 0 | PRIMARY | 2 | ad_unit_id | A | 2842354 | NULL | NULL | | BTREE | | | impression | 1 | impression_FI_1 | 1 | ad_unit_id | A | 50 | NULL | NULL | | BTREE | | | impression | 1 | impression_FI_2 | 1 | network_id | A | 1 | NULL | NULL | YES | BTREE | | | impression | 1 | impression_idx_1 | 1 | view_time | A | 94745 | NULL | NULL | YES | BTREE | | | impression | 1 | impression_idx_2 | 1 | view_time | A | 94745 | NULL | NULL | YES | BTREE | | | impression | 1 | impression_idx_2 | 2 | network_id | A | 94745 | NULL | NULL | YES | BTREE | | +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 7 rows in set (0.00 sec) mysql> select count(*) from impression; +----------+ | count(*) | +----------+ | 2842354 | +----------+
Now, to the meat of it... Here's the update query we're attempting:
UPDATE interval_span span SET span.impression_count = ( SELECT count(imp.id) FROM impression imp INNER JOIN ad_unit au ON imp.ad_unit_id = au.id WHERE imp.view_time >= span.begin_date AND imp.view_time < span.end_date AND imp.network_id = span.network_id AND au.ad_unit_type_id = span.ad_unit_type_id ) WHERE span.begin_date <= '2006-12-03 04:40:00' AND span.begin_date >= '2006-12-03 04:00:00';
This is the smallest date range I can set up (interval_span begin/end date ranges are 30 minutes long). This update query returns:
Query OK, 3 rows affected (2 min 13.50 sec) Rows matched: 70 Changed: 3 Warnings: 0
I need to get this time WAY down. Here are some notes I took while testing:
1. Running inner select count query with constant values takes 2.2 sec
2. Running whole update without inner join on ad_unit takes much longer
3. Creating the impression_idx_2 index on impression(view_time, network_id) reduced execution by 10%
Ok, here's some more notes, that hopefully will lend more insight. In order to try to use mysql's explain tool, I re-wrote the query as a select, it runs in almost identical time, so I think I've got it right, but haven't figured out how to optimize it further:
SELECT span.id, count(imp.id) FROM interval_span span INNER JOIN impression imp ON imp.network_id = span.network_id INNER JOIN ad_unit au ON imp.ad_unit_id = au.id WHERE span.begin_date <= '2006-12-03 04:40:00' AND span.begin_date >= '2006-12-03 04:00:00' AND imp.view_time >= span.begin_date AND imp.view_time < span.end_date AND au.ad_unit_type_id = span.ad_unit_type_id GROUP BY span.id; +--------+---------------+ | id | count(imp.id) | +--------+---------------+ | 219136 | 19013 | | 219146 | 58026 | | 219156 | 35805 | +--------+---------------+ 3 rows in set (2 min 21.20 sec)
Any insight you could lend would be GREATLY appreciated! I've been working on this for quite some time. Please feel free to publish this note and your response, I've found your responses to others very helpful, I think this could help others as well.
Thanks,
Lon
Subject
Views
Written By
Posted
2199
December 07, 2006 02:04AM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.