MySQL Forums
Forum List  »  Performance

Re: ideal indexing for inner join, rowcount of 1
Posted by: Rick James
Date: March 12, 2011 10:19AM

> I am reading an indexing book that says that the ideal index for a single table select includes all the where clause columns first, the order by columns next, and then any additional select columns. This index avoids hitting the table itself.

In general, that is correct with MySQL (either InnoDB or MyISAM). When you do EXPLAIN SELECT ..., it will indicate that by saying "Using index". However, that only applies to one table at a time. With a JOIN, you cannot get all the fields from both tables into a single index.

Your query,
SELECT  d.imageUrl
    FROM  CategoryParentChild pc
    INNER JOIN  Document d ON pc.childId = d.id
    WHERE  pc.parentPath like :likepath
      AND  d.imageUrl IS NOT NULL
    ORDER BY  pc.parentPath
    LIMIT  0, 1
has other issues...
When you start LIKE with a wild card, it cannot make much use of the index. You have "string%", so it can do a "range" scan, which is reasonably efficient.

IS NOT NULL -- This also does not make good use of an index.

To better discuss your case, please provide
* SHOW CREATE TABLE tbl\G -- engine, indexes
* SHOW TABLE STATUS LIKE 'tbl'\G -- sizes
* EXPLAIN SELECT ...\G -- clues of inefficiencies
* SHOW VARIABLES LIKE '%buffer%'; -- cache size
and surround them with [ code ] and [ / code ]

MySQL will pick one of the two tables to start with, then it will reach (repeatedly) into the other table. It is not obvious (without the EXPLAIN) which table it will start with. For now, I'll guess that it will pick pc because of "pc.parentPath like 'string%'". That would beg for an index _starting_ with parentPath, as you suggested. Then, as you suggested, you it can do "Using index" because you have
INDEX(parentPath, childId)
since that includes all the necessary fields from pc. As an extra bonus, it will probably notice that that takes care of the ORDER BY, thereby eliminating a sort pass. This is actually critical -- if it cannot get the rows in order, it will have to collect _all_ possible rows (from both tables), then sort them, and finally deliver the LIMIT that you want. (I often encounter JOIN-WHERE-ORDER-LIMIT where there is no way to prevent the sort.)

That's a good start, but it still has to reach into d to check imageUrl for being NULL. If lots are NULL, it could take a while.

To get "Using index" on d, you would need (as you say)
INDEX(id, imageUrl)
Actually, either order should work.

That is, your indexes and your analyses were correct.

> Question 2 is with a LIMIT 0,1, will the query engine stop the query early or get all the rows anyway?
I don't know for sure. It seems like it could stop early. Perhaps the only way to tell is to have the Slowlog turned on, and see if this query comes up there, and look for "Rows examined". That value is the total number of rows (rows of the index, since it is "Using index") in both tables that needed to be looked at. If everything is just right (including NULLness of imageUrl), the value could be as low as 2 (LIMIT 1, but hitting both tables).

For this case, building indexes to achieve "Using index" seems reasonable. For other cases, there are caveats:
* There is a limit to the size (bytes wide) of an index, so you might not be able to add all the columns necessary.
* Extra indexes slows down INSERTs. (This is usually not a big deal, but it can be in huge tables with random indexes.)
* Extra indexes and wide indexes take extra disk space. (Again, rarely a big deal.)

> I think I can do my indices correctly for all my joins.
Only if the index it starts with has the ORDER BY field(s) in the same order. I often encounter JOIN-WHERE-ORDER-LIMIT where there is no way to prevent the sort.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: ideal indexing for inner join, rowcount of 1
605
March 12, 2011 10:19AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.