MySQL :: Re: Case sensitive vs non case sensitive performance

New Topic

Re: Case sensitive vs non case sensitive performance

Posted by: Rick James
Date: February 28, 2013 09:04AM

Sure, there is a difference. But it is minor.

The main cost in any operation if finding and fetching the records. Fetching a block from disk: 10ms. Comparing two strings (assuming a 2GHz CPU): nanoseconds. So, if actual I/O is involved, we are talking literally a million to one difference in time taken. If things are cached, we are still looking at a 1000:1.

From another point of view... Focus on things that give you bigger wins. Reformulating a SELECT can sometimes give you 10x (that's 10:1 in the _good_ direction) improvement. A Summary table gives 10:1. Adding an index might give 100:1. Etc.

This may be an extreme case (since utf8_unicode_ci may be the most complex collation):

mysql> SELECT BENCHMARK(100000000, _utf8'abc' < _utf8'xyz' COLLATE utf8_bin);
+----------------------------------------------------------------+
| BENCHMARK(100000000, _utf8'abc' < _utf8'xyz' COLLATE utf8_bin) |
+----------------------------------------------------------------+
|                                                              0 |
+----------------------------------------------------------------+
1 row in set (3.37 sec)

mysql> SELECT BENCHMARK(100000000, _utf8'abc' < _utf8'xyz' COLLATE utf8_unicode_ci);
+-----------------------------------------------------------------------+
| BENCHMARK(100000000, _utf8'abc' < _utf8'xyz' COLLATE utf8_unicode_ci) |
+-----------------------------------------------------------------------+
|                                                                     0 |
+-----------------------------------------------------------------------+
1 row in set (9.42 sec)

That is,
34ns for one comparison in utf8_bin
94ns for one comparison in utf8_unicode_ci

If I am sorting a million rows, it will take many seconds to shovel around the million rows, but it will take only a fraction of a second to do the N*logN string comparisions. And only a fraction of a second extra for a more complex collation.

Caveats:
* My test case can decide in the first character
* My test case has no accented characters
* An unknown amount of overhead is involved in running BENCHMARK.
* etc. (It is all too easy to fabricate a benchmark to 'prove' a point.)

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Case sensitive vs non case sensitive performance

5474

Kjeld Flarup

February 27, 2013 03:29AM

Re: Case sensitive vs non case sensitive performance

2078

Rick James

February 28, 2013 09:04AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.