MySQL Forums
Forum List  »  Newbie

Re: Benefits of indexing
Posted by: Rick James
Date: April 10, 2014 03:35PM

Excuse me while I vent (briefly)...
<nasty-comment>
I hate it when people say "All the tables have the correct indexes on." It usually means that they are a novice and have not learned about "compound indexes".
</nasty-comment>

<more-polite>
Please provide SHOW CREATE TABLE and the actual SELECTs; I would like to evaluate the indexes, and possibly recommend changes.
</more-polite>

> One has over a million. The question is - would we get an improvement if the tables were significantly smaller?

* The short answer is: No.
* The more accruate answer is: Not appreciably. This gets into the depth of the BTree. 10K rows probably has a BTree depth of 2; a million rows probably has 3. That is, for a "point query" (finding _one_ row with the _perfect_ index) will grow from hitting 2 blocks in the index to hitting 3. These blocks are usually cached, and the total time is likely to be so few milliseconds that 2 vs 3 does not matter.
* The exception: If the index is not so good, or you need lots of rows, or lots of other cases... You may need to scan the entire table ("table scan") or at least an entire index ("index scan"). This is slow, and is proportional to the number of rows. So in _this_ case, the answer is: Yes.

Back to my nasty comment...
INDEX(last_name), INDEX(first_name) is _not_ the same as
INDEX(last_name, first_name).
For names, the latter ("compound") index is usually better. For some other situations, the former (two indexes) may be better.

> SELECT * FROM table1 LEFT JOIN table2 ON table1.field1 = table2.field1

That _particular_ example must be executed thus (apologies to Phillip for redundancy):
1. Do a table scan on table1,
2. Reach into table2 ("Nested Loop Join") for each row in table1.
2a. Hopefully, table2 has an INDEX (remember, a PRIMARY KEY is an INDEX) on field1. If so, this probe into table2 will be 'efficient'.

Step 1 is long if table1 is big.
Step 2 is fast, but it is invoked a lot (based on size of table1)

If you remove "LEFT", the optimizer is given the option to start with table2 instead of table1. The uptimizer will choose based on some black magic; it will usually choose the better ordering. All other things being equal, the optimizer is likely to start with the smaller table.

Another note on NLJ: MySQL almost always starts (a JOIN) with one table, then does NLJ into the next, then then next, etc. In _rare_ cases, it will use an "index merge"; usually an improved index will outperform such.

> DESC:

Please use SHOW CREATE TABLE; it is more DESCriptive than DESCribe.

> table2.field1 has one index on it

So, that falls into my step "2a".

> Why is it not using an index for table1?

When an index is used, the code needs to bounce back and forth between the index (a BTree in one place) and the "data" (in a different place). If more than ~20% of a table needs to be fetched, the optimizer will usually say 'screw all that bouncing' and simply scan the data ("table scan").

But there is another reason for not using any index on table1... You have no constraint on table1; you are asking for _all_ the rows _always_. (Note: I am refering to your "table1 LEFT JOIN table2" example.)

> Could it be because one of the columns is NULL and the other NOT NULL. And/or one is an unsigned int and one is a signed int?

Again, I need to see SHOW CREATE TABLE.
SIGNED vs UNSIGNED and other type differences _may_ invalidate ability to use an index, but perhaps not in this case. Still, it is wise to get the types the same.
NULL vs NOT NULL does not (I think) hinder index usage.
The EXPLAIN says it did use table2's PRIMARY KEY:
type: eq_ref key: PRIMARY key_len: 4 rows: 1

Here is a minor variation:
SELECT * FROM table1 LEFT JOIN table2 ON table1.field1 = table2.field1 WHERE table1.xyz = 123
Now this would be optimal:
1. table1: INDEX(xyz) -- this would efficiently find 123 and filter down to very few rows to work with
2. table2: INDEX(field1) (as before) -- efficiently do the NLJ -- but not have to do it much due to filtering in step 1.
I would happily do that query on a table with a billion rows.
Exception: If xyz were not UNIQUE, and there were a lot of rows =123, then it could be slow -- but mostly proportional to the number of 123s in table1.

Back to your question about table size. I suggest it is not the main question. The right question is "how many rows (in the index and/or data) does the query execution have to touch?" That, together with some understanding about BTrees and NLJ, will tell you when table size matters and when it does not.

> would we get an improvement if the tables were significantly smaller?

Just in case you are thinking about barking up these trees:
* Do NOT turn a big table into a bunch of small tables. That is almost always worse -- in many ways.
* Do NOT PARTITION a big table just because it is big. There are only a small number of reasons to use PARTITIONing; bigness, itself, is not one of them.

Further reading:
Indexes, especially 'compound': http://mysql.rjweb.org/doc.php/index1
Soundbites on indexing: http://mysql.rjweb.org/doc.php/ricksrots#indexing
Partitioning: http://mysql.rjweb.org/doc.php/partitionmaint

Options: ReplyQuote


Subject
Written By
Posted
April 09, 2014 09:26AM
April 09, 2014 12:38PM
April 09, 2014 02:21PM
April 10, 2014 04:18AM
April 10, 2014 05:36AM
Re: Benefits of indexing
April 10, 2014 03:35PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.