searches on utf8_unicode_ci only work for ASCII
Posted by: Jorge Hernández
Date: July 06, 2010 05:25AM

I cannot get searches with collation utf8_unicode_ci to work reliably when trying accented latin characters. phpMyAdmin reports that "MySQL charset: UTF-8 Unicode (utf8)".

On the tiny testdb dumped below, the query:
SELECT * FROM  `testtable` WHERE  `textfield` LIKE  '%ñ%'
retuns ids 1,2,3,4,5,8 and 10, even though only ids 2 and 4 contain the character "ñ"

similarly;
SELECT * FROM  `testtable` WHERE  `textfield` LIKE  '%eñe%'
returns ids 2 and 4, even though only id 4 contains the string "eñe"

similarly:
SELECT * FROM  `testtable` WHERE  `textfield` LIKE  '%ò%'
returns a multitude of ids that do not contain the character "ò".

What am I missing?


testdb dump:
-- phpMyAdmin SQL Dump
-- phpMyAdmin SQL Dump
-- version 3.2.4
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Jul 06, 2010 at 08:50 AM
-- Server version: 5.1.44
-- PHP Version: 5.3.1

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;

--
-- Database: `testdb`
--

-- --------------------------------------------------------

--
-- Table structure for table `testtable`
--

CREATE TABLE IF NOT EXISTS `testtable` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `textfield` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=22 ;

--
-- Dumping data for table `testtable`
--

INSERT INTO `testtable` (`id`, `textfield`) VALUES
(1, 'Just plain ASCII '),
(2, 'This contains a Spanish eñe'),
(3, 'characters only found in English'),
(4, 'Muchísimas palabras en español contienen acentos'),
(5, 'Nothing fancy'),
(6, 'Història del turisme a les Illes Balears'),
(7, 'How about some “curly quotes”?'),
(8, '¿Do you open question marks in Spanish?'),
(9, 'Català has grave tildes'),
(10, 'Dropping some 日本語 in here');

Options: ReplyQuote


Subject
Views
Written By
Posted
searches on utf8_unicode_ci only work for ASCII
3699
July 06, 2010 05:25AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.