Re: What is quicker for searching out of these two?
Posted by:
Rick James
Date: January 27, 2013 12:01AM
50 columns is rarely a good idea. A column (even CHAR(0) NULL) takes some space and other overhead. INDEXes on lots of columns is bulky and often not useful.
I would probably do
`decade` ENUM ('unclassified', '40s', '50s', ...) NOT NULL
That would take 1 byte (assuming no more than 255 'decades').
Back to your speed question...
First, note that an index will not be used for a non-selective field. For example, if 20% of the table is 50's, then INDEX(decade) will be ignored, and a table scan will be done for
SELECT ... FROM tbl WHERE decade = '50s';
My example assumes the ENUM, but the answer is the same for the encodings of `decade` that you suggested.
Is `genre` another set of tags? Can a song be in only one genre? Or can it be in many?
Let's assume a song can have more than one `genre`. Consider the SET datatype. A set of 32 possibilities would occupy 4 bytes. But an INDEX on a SET is probably useless.
Actually, it may be better to make another table:
CREATE TABLE SongGenres (
song_id ...,
genre ...,
PRIMARY KEY (genre, song_id),
INDEX(song_id, genre)
) ENGINE=InnoDB;
That is a many-to-many mapping (relationship). You would JOIN to your main table, while doing
WHERE SongGenres.genre = 'country'
Going the other way,
SELECT ... GROUP_CONCAT(SongGenres.genre) ... FROM...JOIN...
would let you list all the genres for a song when fetching it.
Bottom line: the char(0) method is not a good idea, at least not for multiple, similar, columns.
> up to four would be queried at any one time
* MySQL rarely uses more than _one_ INDEX in a query.
Those two statements leads to having multiple "compound" indexes. Let's say the user can select one to four of decade, genre, x, y. Then the following _might_ be a useful set of indexes:
INDEX(decade, genre)
INDEX(genre, x)
INDEX(x, y)
INDEX(decade, x)
INDEX(genre, y)
INDEX(y, decade)
Six indexes is a civilized number; a hundred is not. These 4 would help the SELECT with some combinations, not all.
But, there is still a problem.
SELECT ... FROM tbl WHERE decade = '50s'; -- won't use any index.
WHERE decade = '50s' AND genre = 'classical'; -- will use the first index, and be efficient
WHERE decade = '50s' AND genre = 'classical' AND x = ... -- Also good.
WHERE decade = '50s' AND genre = 'classical' AND y = ... -- Potentially better -- the optimizer can pick between the first and last of the INDEXes.
My 6 indexes are probably not optimal. We need to look at the distribution of the values of each 4 searchable field.
For example, if one of the fields is `lead_singer`, ...
(I avoided `singer` because that would be a many-to-many situation.)
INDEX(lead_singer)
would probably be good. Note that it has just the one field. It is a very selective field. So even if the user asks for "lead_singer = '... AND x = ...", the x does not add much. So don't bother compounding lead_singer with anything.
Anyway, how many singers sing in more than one decade or genre? OK, the SELECT will have to work a little harder than if you had INDEX(lead_singer, decade), but it is not worth it.