MySQL Forums
Forum List  »  Optimizer & Parser

Parallel scans
Posted by: Steve Fink
Date: May 19, 2009 04:15PM

I'll be the first to admit that I don't know a whole lot about databases, but I was recently running into a performance problem with a query that mysql implemented as two nested full table scans. If I were to implement the join outside of a database, I would do it very differently.

Here's a stripped down example: say you have two tables

Create Table: CREATE TABLE `t1` (
  `prim` int(11) NOT NULL auto_increment,
  `f` varchar(20) default NULL,
  PRIMARY KEY  (`prim`),
  KEY `f` (`f`)
) ENGINE=InnoDB AUTO_INCREMENT=2913 DEFAULT CHARSET=latin1

CREATE TABLE `t2` (
  `prim` int(11) NOT NULL auto_increment,
  `f` varchar(20) default NULL,
  PRIMARY KEY  (`prim`)
) ENGINE=InnoDB AUTO_INCREMENT=93185 DEFAULT CHARSET=latin1

Now say I wanted to do this query:

mysql> explain select t1.prim,t2.f from t1 left outer join t2 on t1.f=t2.f;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | t1    | index | NULL          | f    | 23      | NULL |  3075 | Using index | 
|  1 | SIMPLE      | t2    | ALL   | NULL          | NULL | NULL    | NULL | 88416 |             | 
+----+-------------+-------+-------+---------------+------+---------+------+-------+-------------+

I printed out the table statuses at the end of this post.

So mysql is going to scan through the t1's index on f, and for every row, it'll do a full table scan of t2. But that doesn't make a whole lot of sense -- especially since I happened to have ordered t2 on the f column. What I would want to happen is for t2.f to be filesorted (if it wasn't already ordered that way) and then t1's f index and the sorted list to be scanned in parallel: for every row of t1.f, check whether there is a matching t2.f. If not, output a NULL and advance t1.f. If so, output a row for every matching t2.f, then advance t1.f. If it changes, advance t2.f. Or something like that.

As I understand mysql's optimizer, it never considers this option. It always arranges the tables in some order, then grabs one row for the first table (or index on the table or whatever), then does the next table's lookup operation, then the next, etc. until the full join is processed. So there's no way of describing a parallel scan.

But it doesn't feel like it would be that much of a stretch to add one in, which makes me wonder if I'm just missing something and it's already there. Can anyone confirm or deny it?

The most straightforward way that I can think of wedging in this operation, assuming it doesn't already exist, would be to either add a 'Extra' to type 'ALL' which says to use a cursor (sorry; that's an overloaded term), or to make a new type. Either way, you execute the query as if it were nested full table scans or whatever, but you know that the inner table scan can always start from (almost) where it left off last time. The cursor would always scan until just before the next one greater than the search key, and wouldn't actually advance for the purposes of the next iteration if the search key was found. But the point is that this could preserve the general structure of mysql's query plans while allowing an optimization that it seems to me would be applicable in quite a few situations.

(There are additional wrinkles that I can think of of the top of my head. For triply-nested loops, or worse, you'd need to reset the cursor to the beginning when the outer outer loop advanced. And you could simplify the cursor advancement logic if either the outer or the inner scan was known to have unique values. And I'm sure many other things.)

Am I crazy or just naive? In the case of my query, this would bring the number of rows examined down from the product of the two tables sizes to their sum, which seems worthwhile.

----

mysql> show table status like 't1' \G
*************************** 1. row ***************************
           Name: t1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 2528
 Avg_row_length: 64
    Data_length: 163840
Max_data_length: 0
   Index_length: 196608
      Data_free: 0
 Auto_increment: 2913
    Create_time: 2009-05-18 12:03:08
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: InnoDB free: 1379328 kB
1 row in set (0.00 sec)

mysql> show table status like 't2' \G
*************************** 1. row ***************************
           Name: t2
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 90870
 Avg_row_length: 63
    Data_length: 5783552
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: 93185
    Create_time: 2009-05-19 14:35:17
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: InnoDB free: 4096 kB

Options: ReplyQuote


Subject
Views
Written By
Posted
Parallel scans
3841
May 19, 2009 04:15PM
2174
May 20, 2009 10:39PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.