Re: performance issue
Posted by:
Rick James
Date: March 04, 2015 08:04PM
This does not look like the traditional Data Warehouse table, so I don't have a quick answer. However, there might be a possibility of building Summary Tables.
What is the dataflow? Do you increment file_2grams_freq in a bunch of rows whenever a new document comes in?
If the answer is no, and file_2grams_freq does not change much, then:
To help the SELECT you mentioned, change
> KEY (`text_topic_text_medium_id_text_medium`),
to this compound, covering, index:
INDEX(text_topic_text_medium_id_text_medium, `2grams_id_2grams`, file_2grams_freq)
If file_2grams_freq changes frequently, then that index would probably be ill-advised. Instead, add
INDEX(text_topic_text_medium_id_text_medium, `2grams_id_2grams`)
which won't be 'covering', but will, I think, be better than the indexes you have now.
That should help that one query; it will not help much else. Would you like to list some more 'slow' queries?
I am puzzled by something:
> PRIMARY KEY (`id_file_2grams`, `file_control_id_file_control`, `2grams_id_2grams`),
> UNIQUE KEY `id_file_2grams_UNIQUE` (`id_file_2grams`),
Having both of those seems strange; a PRIMARY KEY is, by definition, UNIQUE. Yet the UNIQUE key says that only the first part is unique, making the UNIQUEness constraint in the PK moot.