I cannot get searches with collation utf8_unicode_ci to work reliably when trying accented latin characters. phpMyAdmin reports that "MySQL charset: UTF-8 Unicode (utf8)".
On the tiny testdb dumped below, the query:
SELECT * FROM `testtable` WHERE `textfield` LIKE '%ñ%'
retuns ids 1,2,3,4,5,8 and 10, even though only ids 2 and 4 contain the character "ñ"
similarly;
SELECT * FROM `testtable` WHERE `textfield` LIKE '%eñe%'
returns ids 2 and 4, even though only id 4 contains the string "eñe"
similarly:
SELECT * FROM `testtable` WHERE `textfield` LIKE '%ò%'
returns a multitude of ids that do not contain the character "ò".
What am I missing?
testdb dump:
-- phpMyAdmin SQL Dump
-- phpMyAdmin SQL Dump
-- version 3.2.4
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Jul 06, 2010 at 08:50 AM
-- Server version: 5.1.44
-- PHP Version: 5.3.1
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
--
-- Database: `testdb`
--
-- --------------------------------------------------------
--
-- Table structure for table `testtable`
--
CREATE TABLE IF NOT EXISTS `testtable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`textfield` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=22 ;
--
-- Dumping data for table `testtable`
--
INSERT INTO `testtable` (`id`, `textfield`) VALUES
(1, 'Just plain ASCII '),
(2, 'This contains a Spanish eñe'),
(3, 'characters only found in English'),
(4, 'Muchísimas palabras en español contienen acentos'),
(5, 'Nothing fancy'),
(6, 'Història del turisme a les Illes Balears'),
(7, 'How about some “curly quotes”?'),
(8, '¿Do you open question marks in Spanish?'),
(9, 'Català has grave tildes'),
(10, 'Dropping some 日本語 in here');