MySQL :: Problem mit LIKE auf Index über selbstgebauter UTF8-Collation für 5.0

New Topic

Problem mit LIKE auf Index über selbstgebauter UTF8-Collation für 5.0

Posted by: Nicolas Kratz
Date: November 08, 2010 06:29AM

Hallo zusammen.

Um so etwas ähnliches wie utf8_german2_ci unter 5.0 zu bekommen haben wir mit Hilfe der Informationen aus http://bugs.mysql.com/bug.php?id=38758 und http://dev.mysql.com/doc/refman/5.0/en/adding-collation.html eine eigene Collation gebastelt und in die /usr/share/mysql/charsets/Index.xml gehängt, siehe unten. Es werden nur die Umlaute hinter ihre Vokale sortiert, ß/ss werden nicht angefaßt. Funktionierte erst mal tadellos.

Nun aber funktioniert LIKE auf indizierten varchars nicht mehr wirklich wie man es sich vorstellt. Kann mir jemand erklären wie die Collation den Aufbau/das Auslesen des Index beeinflußt? Korrektes Suchen funktioniert nur noch per table scan, also ignore index oder LIKE '%...'.

Siehe Testcase unten, man beachte die empty sets sobald die Werte zweimal in der Tabelle stehen. Ich verstehs nicht.

Danke und schönen Gruß,
Nicolas Kratz

------------------------------

$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 77
Server version: 5.0.51a-24+lenny4 (Debian)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

(nick@localhost) [(none)]> set names utf8;
Query OK, 0 rows affected (0.00 sec)

(nick@localhost) [(none)]> select version();
+-------------------+
| version()         |
+-------------------+
| 5.0.51a-24+lenny4 | 
+-------------------+
1 row in set (0.00 sec)

(nick@localhost) [(none)]> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
(nick@localhost) [test]> drop table if exists t1;
Query OK, 0 rows affected (0.02 sec)

(nick@localhost) [test]> create table t1 (
    ->     id int primary key auto_increment,
    ->     t varchar(64) charset utf8 collate utf8_rsm_ci,
    ->     index t_idx (t)
    -> ) engine=innodb charset latin1;
Query OK, 0 rows affected (0.02 sec)

(nick@localhost) [test]> insert into t1 (t) values ('test'),('täst'),('tast');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

(nick@localhost) [test]> select * from t1 where t like 'tä%';
+----+-------+
| id | t     |
+----+-------+
|  2 | täst  | 
+----+-------+
1 row in set (0.00 sec)

(nick@localhost) [test]> select * from t1 where t like 'ta%';
+----+------+
| id | t    |
+----+------+
|  3 | tast | 
+----+------+
1 row in set (0.00 sec)

(nick@localhost) [test]> insert into t1 (t) values ('test'),('täst'),('tast');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

(nick@localhost) [test]> select * from t1 where t like 'tä%';
Empty set (0.00 sec)

(nick@localhost) [test]> select * from t1 where t like 'ta%';
Empty set (0.00 sec)

(nick@localhost) [test]> select * from t1 ignore index (t_idx) where t like 'ta%';
+----+------+
| id | t    |
+----+------+
|  3 | tast | 
|  6 | tast | 
+----+------+
2 rows in set (0.00 sec)

(nick@localhost) [test]> select * from t1 ignore index (t_idx) where t like 'tä%';
+----+-------+
| id | t     |
+----+-------+
|  2 | täst  | 
|  5 | täst  | 
+----+-------+
2 rows in set (0.00 sec)

------------------------------

<charset name="utf8">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
  <alias>utf-8</alias>
  <collation name="utf8_general_ci" id="33">
   <flag>primary</flag>
   <flag>compiled</flag>
  </collation>
  <collation name="utf8_bin"        id="83">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="utf8_rsm_ci" id="99">
    <rules>
      <reset>a</reset>
      <p>\u00E4</p> <!-- ä -->
      <reset>A</reset>
      <p>\u00C4</p> <!-- Ä -->     
      <reset>o</reset>
      <p>\u00F6</p>
      <reset>O</reset>
      <p>\u00D6</p>
      <reset>u</reset>
      <p>\u00FC</p>
      <reset>U</reset>
      <p>\u00DC</p>
    </rules>
  </collation>
</charset>

-- edit: code tags eingefügt.

Edited 1 time(s). Last edit at 11/08/2010 08:09AM by Nicolas Kratz.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Problem mit LIKE auf Index über selbstgebauter UTF8-Collation für 5.0

3187

Nicolas Kratz

November 08, 2010 06:29AM

Re: Problem mit LIKE auf Index über selbstgebauter UTF8-Collation für 5.0

1190

Thomas Wiedmann

November 09, 2010 01:33AM

Re: Problem mit LIKE auf Index über selbstgebauter UTF8-Collation für 5.0

1179

Nicolas Kratz

November 09, 2010 02:58AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.