MySQL Forums
Forum List  »  InnoDB

Re: Larger than expected tmp files querying InnoDB format database
Posted by: Rick James
Date: August 15, 2014 03:49PM

> Could you briefly describe why the query in question was behaving badly?

No, because:
* It is not just one cause. (I have mentioned several causes; probably not all.)
* It is hard to dig deeper without having a 'feel' for the data, its size, the distribution of keys, avg/max length of fields in VARCHAR(255) fields, EXPLAIN plan before and after, changes in SHOW VARIABLES LIKE 'Handler%' across the SELECT, etc, etc.

> I will copy/paste that feedback into the CiviCRM forum to conclude that discussion thread there.

Several things I have said can be pasted there. Maybe my best advice for them is below. I would be happy to have a dialog with them.

> perhaps their optimization issue is common for many of their queries

Perhaps; perhaps not.

Re: 2GB tmp file: Here are some techniques that work in _some_ situations. (I cannot predict whether they would help in this situation.)

Example 1:
SELECT ... FROM JOIN JOIN ...
-->
SELECT
( SELECT ... FROM ... ), ...
FROM JOIN ...

That is, turn one (or more) JOINs into subqueries (probably 'correlated') in the SELECT. (Example 1 probably won't help here, so I will say nothing further about it.)

Example 2:

SELECT ...
FROM ( SELECT ... ORDER BY + LIMIT ) x -- minimal number of tables
JOIN ...
WHERE / GROUP BY -- no LIMIT here; ORDER BY may or may not be needed again

That is, move the LIMIT into a subquery (in FROM or JOIN). This _may_ do a lot of work more easily in the subquery (which has fewer tables), then do the LIMIT. Note that the tmp table(s) in the subquery _may_ be much smaller than yours, while the tmp table(s) outside will be trivial because the LIMIT has already been applied, thereby diminishing the number of rows to deal with.

Example 3:

Like Example 2, but note that the inner query may have some of the same table(s) as the outer. The trick here is to use "Using index" in the inner query when hitting lots of rows; while not worrying about that optimization outside because of fewer rows.

"Using index" is the clue in EXPLAIN that means that it can perform the SELECT (JOIN, etc) by looking only at an INDEX, not the DATA. This is _usually_ more efficient (avg = 2x; occasionally 10x faster).

OK, maybe I can give you (them) a clue of what to try:
1. Devise a query that gets to the point of doing the LIMIT. Collect as little as possible during this query. That is, collect ids (PRIMARY KEY values), but nothing else (unless it is 'free').
2. Put that into "FROM ( SELECT ... LIMIT )" as the starter (middle) of the final query.
3. Now JOIN that to whatever is needed to find the other values needed. (Hence, my point in Ex 3 about JOINing back to the same tables.)
4. Review the part of the EXPLAIN plan for the "FROM(...)". See if it is "Using index" for all of those table(s). If not, consider adding new indexes. (Good indexes are most important here.)
5. Review the "Rows" column in the EXPLAIN plan. Big number are a no-no (usually). Consider adding indexes, or perhaps rethinking the schema.
6. "ORDER BY ... LIMIT ..." has a better chance of being optimized if the ORDER BY fields match some index, and that index is in the "first" table in the subquery part of the EXPLAIN.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Larger than expected tmp files querying InnoDB format database
1097
August 15, 2014 03:49PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.