MySQL Forums
Forum List  »  InnoDB

Re: Larger than expected tmp files querying InnoDB format database
Posted by: Rick James
Date: July 11, 2014 12:06PM

First come comments about the schema...

* 15-way JOIN -- Ugh! I'll focus on that in a minute.

* 15 indexes for civicrm_contact -- that is a lot, and creates overhead on INSERTs.

* The 15 indexes are each on a single field. It is as if it expects you to look up rows by first_name alone, or prefix_alone. Is this realistic? I suspect not. Normally this would be much better
INDEX(last_name, first_name)
instead of
INDEX(last_name), INDEX(first_name)
Note: MySQL almost never uses more than one INDEX at a time for a table; when it does, it is less efficient than if you had had a "compound" index.

* Tables start with "civicrm_" -- clutters the queries with no benefit. Similarly for fields. (I understand that CiviCRM, not you, is probably doing that.)

* Normalization of "Mr.", "Jr", etc -- a really bad idea. (Again, I understand it was not your choice.)

* As already guessed:
A guess -- you have a lot of VARCHAR(255) fields that turned into CHAR(255) in the tmp table.
and also a not of (128). Furthermore, they are utf8, therefore when it becomes CHAR(255), it takes 765 bytes.
Note the "Key_len" in the Explain.

* While I see several TEXT fields, I cannot (yet) explain the megabyte per contact you are seeing:
> a simple export of 2379 contacts ALSO generates a 2GB /tmp file
How did you export it? If it was mysqldump, take a look into the file and see if you can find a megabyte of stuff for one "contact".

* INDEX `index_is_deleted`(is_deleted) -- Indexing flags is almost guaranteed to be useless.

* Please provide SHOW TABLE STATUS -- I want to see how big Data_length and Index_length are for each table, plus the estimated number of Rows.

* `hash` varchar(32) COMMENT 'Key for validating requests related to this contact.',
Sounds like an MD5 expressed in ascii. Using utf8 is a waste. Using VAR is a waste.

* `tax_rate` decimal(10,8) COMMENT 'The percentage of the total_amount that is due for this tax.',
Don't try this in Egypt -- they are taxing some things at 200%. 200 won't fit in that field.

* `price` decimal(20,2) -- That is fine for the Dollar and the Euro. But some currencies need 0, some need 4, decimal places. 18 places to the left -- I don't think the worst hyper-inflation has approached that large a number. Oops, I stand corrected: http://en.wikipedia.org/wiki/Hyperinflation gives a few cases where 18 digits is not enough.

* GROUP BY civicrm_contribution.id ORDER BY receive_date desc, contact_a.id LIMIT 0, 10
This is almost guaranteed to cause two temporary tables, one for the GROUP BY, another for the ORDER BY. And all the data must be carried though the temp tables before getting to the LIMIT.

* Oh, even worse -- please try this:
SELECT COUNT(*) FROM ... JOIN ... WHERE ... -- no other fields selected; do all the joins; keep the WHERE; but no GROUP BY; no ORDER BY; no LIMIT. I suspect it will deliver a ridiculously large number. If so, that is because the JOINs are gathering extra combinations before getting to the GROUP BY which boils it down to the desired rows.

* If you don't get a huge number for COUNT(*), then one could argue that the GROUP BY is "wrong". See what happens with the original statement, removing only the GROUP BY.

* ORDER BY receive_date desc, contact_a.id
MySQL cannot optimize a compound ORDER BY where one field is DESC and another is ASC.
Nor can it optimize an ORDER BY that references two tables.
Yes it will do it, but by brute force (create tmp table, sort it, etc).

* The EXPLAIN shows reaching for about 24322 rows in civicrm_contribution. If it were doing it first, that might be less of a problem. (It's hard to see what is going on.)

* INNER JOIN civicrm_option_group cog ON cog.id = cov.option_group_id AND cog.name = 'account_relationship'
(and a few others) -- This smells like an ER pattern. It can be terribly inefficient; it is obviously clumsy due to the number of tables and JOINs.

* AARGGHH -- civicrm_option_group is mentioned several times, but not used! Especially in the LEFT JOINs, all it does it waste time. Can you remove the unnecessary LEFT JOINs? The INNER JOIN is checking that there is an account_relationship, but that may be unnecessary, too.

At first, the schema looked not-too-bad. Now I am feeling ill.
This is the price people pay for 3rd party software.

Back to your problem...
The complexity of the statement is the root cause. Does your peer have the identical schema? And similar-sized tables?

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Larger than expected tmp files querying InnoDB format database
1174
July 11, 2014 12:06PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.