MySQL :: Re: What is quicker for searching out of these two?

New Topic

Re: What is quicker for searching out of these two?

Posted by: Rick James
Date: January 27, 2013 12:01AM

50 columns is rarely a good idea. A column (even CHAR(0) NULL) takes some space and other overhead. INDEXes on lots of columns is bulky and often not useful.

I would probably do
`decade` ENUM ('unclassified', '40s', '50s', ...) NOT NULL
That would take 1 byte (assuming no more than 255 'decades').

Back to your speed question...

First, note that an index will not be used for a non-selective field. For example, if 20% of the table is 50's, then INDEX(decade) will be ignored, and a table scan will be done for
SELECT ... FROM tbl WHERE decade = '50s';
My example assumes the ENUM, but the answer is the same for the encodings of `decade` that you suggested.

Is `genre` another set of tags? Can a song be in only one genre? Or can it be in many?

Let's assume a song can have more than one `genre`. Consider the SET datatype. A set of 32 possibilities would occupy 4 bytes. But an INDEX on a SET is probably useless.

Actually, it may be better to make another table:
CREATE TABLE SongGenres (
song_id ...,
genre ...,
PRIMARY KEY (genre, song_id),
INDEX(song_id, genre)
) ENGINE=InnoDB;
That is a many-to-many mapping (relationship). You would JOIN to your main table, while doing
WHERE SongGenres.genre = 'country'
Going the other way,
SELECT ... GROUP_CONCAT(SongGenres.genre) ... FROM...JOIN...
would let you list all the genres for a song when fetching it.

Bottom line: the char(0) method is not a good idea, at least not for multiple, similar, columns.

> up to four would be queried at any one time
* MySQL rarely uses more than _one_ INDEX in a query.
Those two statements leads to having multiple "compound" indexes. Let's say the user can select one to four of decade, genre, x, y. Then the following _might_ be a useful set of indexes:
INDEX(decade, genre)
INDEX(genre, x)
INDEX(x, y)
INDEX(decade, x)
INDEX(genre, y)
INDEX(y, decade)
Six indexes is a civilized number; a hundred is not. These 4 would help the SELECT with some combinations, not all.

But, there is still a problem.
SELECT ... FROM tbl WHERE decade = '50s'; -- won't use any index.
WHERE decade = '50s' AND genre = 'classical'; -- will use the first index, and be efficient
WHERE decade = '50s' AND genre = 'classical' AND x = ... -- Also good.
WHERE decade = '50s' AND genre = 'classical' AND y = ... -- Potentially better -- the optimizer can pick between the first and last of the INDEXes.

My 6 indexes are probably not optimal. We need to look at the distribution of the values of each 4 searchable field.

For example, if one of the fields is `lead_singer`, ...
(I avoided `singer` because that would be a many-to-many situation.)
INDEX(lead_singer)
would probably be good. Note that it has just the one field. It is a very selective field. So even if the user asks for "lead_singer = '... AND x = ...", the x does not add much. So don't bother compounding lead_singer with anything.
Anyway, how many singers sing in more than one decade or genre? OK, the SELECT will have to work a little harder than if you had INDEX(lead_singer, decade), but it is not worth it.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

What is quicker for searching out of these two?

Shaun Thomson

January 25, 2013 09:03AM

Re: What is quicker for searching out of these two?

Rick James

January 27, 2013 12:01AM

Re: What is quicker for searching out of these two?

Rick James

January 27, 2013 12:12AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.