MySQL :: Problem with optimization of a WHERE clause

New Topic

Problem with optimization of a WHERE clause

Posted by: Giorgio Calderone
Date: August 02, 2007 12:41PM

#
# I found a case where the optimizer performs an optimization which is
# not supposed to do. This happens on MySQL ver. 5.1.19 and 5.1.20 but
# not on older 5.0.27.
#
# Suppose we have a table of data like the following:
#
DROP TABLE IF EXISTS main;
CREATE TABLE main (id INT, c CHAR(10), f FLOAT, INDEX (id));

INSERT INTO main VALUES(0, 'NORTH' , 0);
INSERT INTO main VALUES(0, 'NORTH-EAST', 45);
INSERT INTO main VALUES(1, 'EAST' , 90);
INSERT INTO main VALUES(1, 'SOUTH-EAST', 135);
INSERT INTO main VALUES(2, 'SOUTH' , 180);
INSERT INTO main VALUES(2, 'SOUTH-WEST', 225);
INSERT INTO main VALUES(3, 'WEST' , 270);
INSERT INTO main VALUES(3, 'NORTH-WEST', 315);

SELECT * FROM main;

#
# and a table of indexes used to select records from the `main` table:
#
DROP TABLE IF EXISTS ref;
CREATE TABLE ref (id INT, flag INT);

INSERT INTO ref VALUES (0, 1);
INSERT INTO ref VALUES (1, 0);
INSERT INTO ref VALUES (2, 0);

SELECT * FROM ref;

#
# The first field (`id`) will be used to match records in the `main`
# table, the second field (`flag`) has a special meaning that will be
# explained later.
#
# Now we want to select all records from the `main` table whose `id` is
# present in the `ref` table, a simple join can do the work:
#
SELECT *
FROM main JOIN ref USING (id);

#
# Now we go a little further and try to use the `flag` field, in
# particular if it is 1 the matched record on the `main` table can be
# selected without any other check, if it is 0 another check on the
# `f` field need to be performed. The following query do the work:
#
SELECT *
FROM main INNER JOIN ref USING (id)
WHERE (
(ref.flag = 1) OR
(main.f < 200)
);

#+------+------------+------+------+
#| id | c | f | flag |
#+------+------------+------+------+
#| 0 | NORTH | 0 | 1 |
#| 0 | NORTH-EAST | 45 | 1 |
#| 1 | EAST | 90 | 0 |
#| 1 | SOUTH-EAST | 135 | 0 |
#| 2 | SOUTH | 180 | 0 |
#+------+------------+------+------+
#5 rows in set (0.00 sec)

#
# The result is correct. Note the way the query is executed (through the
# EXPLAIN statement before SELECT): the `ref` table has no notes in
# the `Extra` column:
#
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
#| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
#| 1 | SIMPLE | ref | ALL | NULL | NULL | NULL | NULL | 3 | |
#| 1 | SIMPLE | main | ALL | id | NULL | NULL | NULL | 8 | Using where; Using join buffer |
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+

#
# The real case from which this example is taken is of course much
# more complex, and the check on the `f` field must be performed
# through a user defined function. In this simple example it is enough
# to use the following:
#
delimiter //
DROP FUNCTION IF EXISTS fiden //
CREATE FUNCTION fiden(f FLOAT) RETURNS FLOAT
NOT DETERMINISTIC
BEGIN
RETURN f;
END//
delimiter ;
#
# which is a function that simply returns its argument. The previous
# query can now be written:
#
SELECT *
FROM main INNER JOIN ref USING (id)
WHERE (
(ref.flag = 1) OR
(fiden(main.f) < 200)
);

#
#+------+------------+------+------+
#| id | c | f | flag |
#+------+------------+------+------+
#| 0 | NORTH | 0 | 1 |
#| 0 | NORTH-EAST | 45 | 1 |
#+------+------------+------+------+
#2 rows in set (0.00 sec)
#
# The result should have been the same as before, instead only those
# record with `flag = 1` are shown. This behaviour is due to the fact
# that the server optimizes the query and executes the WHERE clause on
# the `ref` table before performing the join (as can be seen using the
# EXPLAIN statement), so only those records with `flag = 1` are
# available.
#
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
#| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
#| 1 | SIMPLE | ref | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
#| 1 | SIMPLE | main | ALL | id | NULL | NULL | NULL | 8 | Using where; Using join buffer |
#+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
#
# In my opinion this optimization should not take place because the
# WHERE clause contains an OR operator. Of course it would be correct
# with an AND operator.
#
# Am I missing something ?
#
# Thanks in advance,
# Giorgio.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Problem with optimization of a WHERE clause

3808

Giorgio Calderone

August 02, 2007 12:41PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.