Hi,
We have an app with 101 tables that need a review with respect to indexes and queries for performance reasons.
I therefore enabled the slow log, and set out to eradicate all entries in it.
Now I discover, that:
1: Sometimes "explain $query" shows that an index is used (all key columns are
NOT NULL), when in fact "$query" shows up in the slow log exactly iff
log-queries-not-using-indexes is set in /etc/mysql/my.cnf. (See below for
example) Why is that?
2: Whether or not an index is used depends on many things, including:
2a: What (number of) rows that are in each table involved
2b: What columns are selected
3: Sometimes even if indexes are used, I've seen that another set of indexes
can speed up the query 10-fold. How can I identify such cases?
select Version() is "5.1.49-3-log" from mysql-server-5.1 5.1.49-3 on debian stable
A couple of questions:
Especially 1: above, that "explain $query" and the slow log disagree, really makes it difficult for me to move on. How can I get a list of which queries don't use indexes? Am I doing something wrong here?
Are there better tools to analyse queries than "explain extended $query"? (I hope so)
My naive attempt at an approach was: Take a backup/mysqldump from a representative large installation. Analyse its slow log. Make sure that queries on that large installation don't show up in the slow log (that doesn't work because of 1: above). Does that make sense? Are there better approaches? Unfortunately my previous approach: Run it all on my developer machine with a tiny debug database doesn't work...
Since which index is used depends on so many things, I guess for the developer optimising queries and indexes is more of an experience-based trial-and-error sort of thing than something that it is predictable. Or what?
We're planning an MyISAM -> Innodb transition soon. Do the index performance characteristics change so that we should do the transition first and optimise post-transition? I guess that is a yes, but how much so?
Extra details and dumps, etc:
###############
About 1:
###############
CREATE TABLE `atable` (
`ID` int(11) NOT NULL auto_increment,
`otherID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`),
KEY `otherID` (`otherID`)
) ENGINE=MyISAM;
INSERT INTO `atable` VALUES (98,135),(97,134),(4,4),(5,5);
RESET QUERY CACHE; FLUSH TABLES;
-- This shows that key = "otherID" which I interpret to mean that the otherID
-- index will be used
EXPLAIN SELECT otherID FROM atable;
-- However, this still shows up in the show log exactly iff
-- log-queries-not-using-indexes is set in my.cnf
SELECT otherID FROM atable;
###############
About 2a:
###############
I have two database dumps that are too large for inlining, so I've put them up on github's gist for downloading [1]. See links at the bottom.
In the following, I'll provide the details for two databases with identical structure but different rows and a particular query, where explain says something different for the two databases, but even though explain in both cases assures me that indexes are used, the slow log still mentions these selects for some reason (as described above):
The tables involved are here and at [2]:
CREATE TABLE `IDs` (
`ID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cfgMonitoredTableRows` (
`ID` int(11) NOT NULL,
`nodeID` int(11) NOT NULL default '0',
`monitoredTableID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`),
KEY `nodeTable` (`nodeID`,`monitoredTableID`),
KEY `mtrSelectInDtMonitoredTablesCron5` (`nodeID`,`ID`,`monitoredTableID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cfgNodes` (
`ID` int(11) NOT NULL,
`managementState` enum('managed','silent','unmanaged') NOT NULL default 'managed',
PRIMARY KEY (`ID`),
UNIQUE KEY `selectByManagementState` (`managementState`,`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `dtEventSources` (
`ID` int(11) NOT NULL,
`sourceType` enum('capmonStatus','nodeAlarm','serviceAlarm','statThresholdAlarm','interfaceStatusAlarm','newNodeEvent','syslogFilterMatchEvent','spanningTreeEvent','configBackupEvent','syslogFilterAlarm','configBackupAlarm') NOT NULL default 'nodeAlarm',
`currentState` enum('ok','pending','unknown','softWarning','warning','softCritical','critical','unmanaged') default NULL,
`nodeID` int(11) default NULL,
PRIMARY KEY (`ID`),
KEY `nodeID` (`nodeID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;j
The select involved can be found in [3]:
SELECT MTR.nodeID
FROM cfgMonitoredTableRows as MTR,
IDs,
cfgNodes as N
LEFT JOIN dtEventSources as ES ON
( ES.sourceType = "nodeAlarm" and ES.nodeID = N.ID )
WHERE MTR.ID = IDs.ID
AND MTR.nodeID = N.ID
AND ES.currentState != "critical"
AND N.managementState != "unmanaged";
For a "large" database [4] with "many" rows, explain tells me that a key is used for all tables in the select:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: N
type: index
possible_keys: PRIMARY,selectByManagementState
key: selectByManagementState
key_len: 5
ref: NULL
rows: 1814
filtered: 100.00
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: ES
type: ref
possible_keys: nodeID
key: nodeID
key_len: 5
ref: large.N.ID
rows: 4
filtered: 100.00
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: MTR
type: ref
possible_keys: PRIMARY,nodeTable,mtrSelectInDtMonitoredTablesCron5
key: mtrSelectInDtMonitoredTablesCron5
key_len: 4
ref: large.ES.nodeID
rows: 47
filtered: 100.00
Extra: Using where; Using index
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: IDs
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: large.MTR.ID
rows: 1
filtered: 100.00
Extra: Using index
But for the exact same structure in a "small" database [5] with "few" rows, explain tells me that a different set of keys will be used:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: IDs
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 118
filtered: 100.00
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: MTR
type: eq_ref
possible_keys: PRIMARY,nodeTable,mtrSelectInDtMonitoredTablesCron5
key: PRIMARY
key_len: 4
ref: small.IDs.ID
rows: 1
filtered: 100.00
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: N
type: eq_ref
possible_keys: PRIMARY,selectByManagementState
key: PRIMARY
key_len: 4
ref: small.MTR.nodeID
rows: 1
filtered: 100.00
Extra: Using where
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: ES
type: ref
possible_keys: nodeID
key: nodeID
key_len: 5
ref: small.MTR.nodeID
rows: 10
filtered: 103.83
Extra: Using where
Regardless of what explain says, they go in the slow log...
URLS:
1:
https://gist.github.com/1200395
2:
https://raw.github.com/gist/1200395/createTables.sql
3:
https://raw.github.com/gist/1200395/select.sh
4:
https://raw.github.com/gist/1200395/large.sql
5:
https://raw.github.com/gist/1200395/small.sql