> look for all the where key and the order by combine them as one index even how many fields it could be
Close. Be sure to have the '=' or 'IN' things first. It is not useful to have more than one other field last in the index. Example
WHERE a = ... AND b > ... ORDER BY c
INDEX(a, b) is optimal
* a is first because of '='
* WHERE fields take priority over GROUP BY and ORDER BY, so 'b' is useful, but not 'c'.
* That is, INDEX(a, b, c) would be ok, but 'c' would not be used.
Exception: There are cases where the following would be best. (Sorry, I cannot explain the cases.)
INDEX(c)
That is, the optimizer decides that fetching the fields in sorted order (c) is more important than filtering (on a and b). It won't hurt to include that INDEX, then use EXPLAIN to see if it is used.
> useless when we are looking it via the like 'J%'?
My bad.
It is not completely "useless", but INDEX(last_name, first_name) will only find last_names beginning with J, without making any use of first_name="Rick". The point is that this would be better for that case:
INDEX(first_name, last_name)
The 'rule' is to have the '=' fields first in the INDEX.
JOINs are more involved. First, the optimizer has to decide which table to start with (unless you are using LEFT JOIN). It will say to itself:
1. Let's see how efficient it might be to start with table1.
2. Let's see how efficient it might be to start with table2.
Now, it will decide which way to do it.
The index for the first table (that it picked) follows the guidelines already outlined. The second table is JOINed ON some field; that field becomes part of the decision making for the second table. Think of the ON field as being '='. Example:
FROM tbl1
JOIN tbl2 ON tbl1.id = tbl2.id
WHERE tbl2.b = 345 AND tbl2.c LIKE 'J%' ...
For tbl2, either of these would be useful if tbl2 is done _second_:
INDEX(id, b, c)
INDEX(b, id, c)
But... If the Optimizer decides to use tbl2 _first_, then
INDEX(b, c)
is likely to be the 'best'.
So, if you can't figure out which table will be picked first, have two indexes so that the optimizer is free to pick:
INDEX(b, id, c) or (id, b, c), and
INDEX(b, c)
(Again, order of the fields is important.)
To do a complete job, you need to find all the SELECTs, UPDATEs, DELETEs, on a table, and decide what is the minimal set of indexes that handle all the cases. (Do not simply 'index every field'.)
> chunk deletes but there is a risk of table locking right
DELETEing a 'chunk' of 1000 rows is usually so fast that users do not notice the impact. DELETEing a million rows in a single statement (or single transaction) will impact other users. DROP PARTITION has no impact. (I started that document before PARTITIONing was common.) So, PARTITIONing is optimal, but only when it is otherwise appropriate. One Rule-of-Thumb: Don't PARTITION a table with fewer than a million rows. More RoTs:
http://mysql.rjweb.org/doc.php/ricksrots