I am attempting to perform a restoration of a mysqldump from a 5.0 server into a new 5.5 server. This data is currently in UTF8 format, and seems to be functioning correctly on the 5.0 machine:
#
# Mysql 5.0
#
mysql> select * from search_total where word like "weiß" \G
*************************** 1. row ***************************
word: weiß
count: 0.138662
1 row in set (0.00 sec)
mysql> select * from search_total where word like "weis" \G
*************************** 1. row ***************************
word: weis
count: 0.30103
1 row in set (0.00 sec)
mysqldump generates the following insert statements:
INSERT INTO `search_total` VALUES ('weis',0.30103);
INSERT INTO `search_total` VALUES ('weiß',0.138662);
However, running these insert statements on the 5.5.9 system provides an error:
mysql> INSERT INTO `search_total` VALUES ('weis',0.30103);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO `search_total` VALUES ('weiß',0.138662);
ERROR 1062 (23000): Duplicate entry 'weiß' for key 'PRIMARY'
mysql>
It seems the insert statement is translating weiß into weis, then checking against the index, whose contents do not seem to be stored in utf8
For example, if I reverse the order of the inserts, I receive the same error. The first entry should store the UTF8 value
mysql> INSERT INTO `search_total` VALUES ('weiß',0.138662);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO `search_total` VALUES ('weis',0.30103);
ERROR 1062 (23000): Duplicate entry 'weis' for key 'PRIMARY'
mysql>
I've tried different collations (which affect selects but shouldn't affect plain inserts like these) dumping and importing as latin1, and even inserting the result from hex values:
mysql> insert into `search_total` values ((select 0x77656973), .111) \G
Query OK, 1 row affected (0.00 sec)
mysql> insert into `search_total` values ((select 0x776569C39F), .111)\G
ERROR 1062 (23000): Duplicate entry 'weiß' for key 'PRIMARY'
mysql> select * from search_total \G
*************************** 1. row ***************************
word: weis
count: 0.111
1 row in set (0.00 sec)
mysql>
What do I need to do in 5.5.9 to get this data imported and stored correctly?
Databases are set to /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */
The table as it exists on 5.0:
mysql> show create table search_total \G
*************************** 1. row ***************************
Table: search_total
Create Table: CREATE TABLE `search_total` (
`word` varchar(50) NOT NULL default '',
`count` float default NULL,
PRIMARY KEY (`word`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show session variables like "coll%" \G
*************************** 1. row ***************************
Variable_name: collation_connection
Value: utf8_general_ci
*************************** 2. row ***************************
Variable_name: collation_database
Value: utf8_general_ci
*************************** 3. row ***************************
Variable_name: collation_server
Value: utf8_general_ci
3 rows in set (0.00 sec)
mysql>
#
# And on mysql 5.5.9
#
mysql> show create table search_total \G
*************************** 1. row ***************************
Table: search_total
Create Table: CREATE TABLE `search_total` (
`word` varchar(50) NOT NULL DEFAULT '',
`count` float DEFAULT NULL,
PRIMARY KEY (`word`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql>
mysql> show session variables like "coll%" \G
*************************** 1. row ***************************
Variable_name: collation_connection
Value: utf8_unicode_ci
*************************** 2. row ***************************
Variable_name: collation_database
Value: utf8_unicode_ci
*************************** 3. row ***************************
Variable_name: collation_server
Value: utf8_unicode_ci
3 rows in set (0.01 sec)