MySQL Forums
Forum List  »  Optimizer & Parser

Update with dependent subquery needs optimization!
Posted by: Lon B
Date: December 06, 2006 05:23PM

Hey all -

These forums have been really helpful from just reading, but now I need some specific help. Any feedback would be greatly appreciated.

I'm in MySQL 4.1.12a, all tables are MyISAM.

There are three tables: interval_span, impression, and ad_unit. The idea is that the impression table has records that need to be counted based primarily on their network_id and ad_unit_type_id (which is retrieved via a join to ad_unit), and secondarily on the view_time being within the applicable date range.

Some table structure details:
mysql> desc interval_span;
+------------------+----------+------+-----+---------------------+----------------+
| Field            | Type     | Null | Key | Default             | Extra         |
+------------------+----------+------+-----+---------------------+----------------+
| id               | int(11)  |      | PRI | NULL                | auto_increment|
| ad_unit_type_id  | int(11)  |      | PRI | 0                   |                |
| network_id       | int(11)  |      | MUL | 0                   |                |
| begin_date       | datetime |      | PRI | 0000-00-00 00:00:00 |                |
| end_date         | datetime |      | PRI | 0000-00-00 00:00:00 |                |
| roadblock_id     | int(11)  | YES  | MUL | NULL                |                |
| expected_traffic | int(11)  | YES  |     | NULL                |                |
| impression_count | int(11)  | YES  |     | NULL                |                |
+------------------+----------+------+-----+---------------------+----------------+
8 rows in set (0.00 sec)

mysql> show index from interval_span;
+---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table         | Non_unique | Key_name            | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| interval_span |          0 | PRIMARY             |            1 | id              | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          0 | PRIMARY             |            2 | ad_unit_type_id | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          0 | PRIMARY             |            3 | begin_date      | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          0 | PRIMARY             |            4 | end_date        | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          1 | interval_span_FI_1  |            1 | ad_unit_type_id | A         |          13 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          1 | interval_span_FI_2  |            1 | network_id      | A         |           5 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          1 | interval_span_FI_3  |            1 | roadblock_id    | A         |      106396 |     NULL | NULL   | YES  | BTREE      |         |
| interval_span |          1 | interval_span_idx_1 |            1 | begin_date      | A         |        1899 |     NULL | NULL   |      | BTREE      |         |
| interval_span |          1 | interval_span_idx_2 |            1 | end_date        | A         |        1899 |     NULL | NULL   |      | BTREE      |         |
+---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
9 rows in set (0.00 sec)

mysql> select count(*) from interval_span;
+----------+
| count(*) |
+----------+
|   106396 |
+----------+

mysql> desc impression;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      |      | PRI | NULL    | auto_increment |
| view_time   | datetime     | YES  | MUL | NULL    |                |
| ad_unit_id  | int(11)      |      | PRI | 0       |                |
| network_id  | int(11)      | YES  | MUL | NULL    |                |
| remote_addr | varchar(255) | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> show index from impression;
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table      | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| impression |          0 | PRIMARY          |            1 | id          | A         |     2842354 |     NULL | NULL   |      | BTREE      |         |
| impression |          0 | PRIMARY          |            2 | ad_unit_id  | A         |     2842354 |     NULL | NULL   |      | BTREE      |         |
| impression |          1 | impression_FI_1  |            1 | ad_unit_id  | A         |          50 |     NULL | NULL   |      | BTREE      |         |
| impression |          1 | impression_FI_2  |            1 | network_id  | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |
| impression |          1 | impression_idx_1 |            1 | view_time   | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
| impression |          1 | impression_idx_2 |            1 | view_time   | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
| impression |          1 | impression_idx_2 |            2 | network_id  | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
7 rows in set (0.00 sec)

mysql> select count(*) from impression;
+----------+
| count(*) |
+----------+
|  2842354 |
+----------+

Now, to the meat of it... Here's the update query we're attempting:

UPDATE interval_span span 
   SET span.impression_count = (
			SELECT count(imp.id)
			  FROM impression imp 
						 INNER JOIN ad_unit au ON imp.ad_unit_id = au.id
			 WHERE imp.view_time >= span.begin_date
			   AND imp.view_time < span.end_date
			   AND imp.network_id = span.network_id
			   AND au.ad_unit_type_id = span.ad_unit_type_id
)
WHERE span.begin_date <= '2006-12-03 04:40:00' 
  AND span.begin_date >= '2006-12-03 04:00:00';

This is the smallest date range I can set up (interval_span begin/end date ranges are 30 minutes long). This update query returns:
Query OK, 3 rows affected (2 min 13.50 sec)
Rows matched: 70  Changed: 3  Warnings: 0

I need to get this time WAY down. Here are some notes I took while testing:

1. Running inner select count query with constant values takes 2.2 sec
2. Running whole update without inner join on ad_unit takes much longer
3. Creating the impression_idx_2 index on impression(view_time, network_id) reduced execution by 10%

Ok, here's some more notes, that hopefully will lend more insight. In order to try to use mysql's explain tool, I re-wrote the query as a select, it runs in almost identical time, so I think I've got it right, but haven't figured out how to optimize it further:

SELECT span.id, count(imp.id)
  FROM interval_span span 
	     INNER JOIN impression imp ON imp.network_id = span.network_id
			 INNER JOIN ad_unit au ON imp.ad_unit_id = au.id
 WHERE span.begin_date <= '2006-12-03 04:40:00' 
   AND span.begin_date >= '2006-12-03 04:00:00'
   AND imp.view_time >= span.begin_date
   AND imp.view_time < span.end_date
   AND au.ad_unit_type_id = span.ad_unit_type_id
GROUP BY span.id;

+--------+---------------+
| id     | count(imp.id) |
+--------+---------------+
| 219136 |         19013 |
| 219146 |         58026 |
| 219156 |         35805 |
+--------+---------------+
3 rows in set (2 min 21.20 sec)

Any insight you could lend would be GREATLY appreciated! I've been working on this for quite some time. Please feel free to publish this note and your response, I've found your responses to others very helpful, I think this could help others as well.

Thanks,

Lon

Options: ReplyQuote


Subject
Views
Written By
Posted
Update with dependent subquery needs optimization!
4351
December 06, 2006 05:23PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.