Problem with fulltext search using encoded german umlauts (strange behaviour of "MATCH(column) AGAINST ('keyword' IN BOOLEAN MODE)")
Hi, i have a big problem with using the MySQL fulltext search with encoded german umlauts. I have build a testing playground to show you what problems i am encountering.
This test database is called "fulltext_search" and contains two tables:
1. tbl_plain_umlauts
2. tbl_html_umlauts
The only difference between those two tables is, that the umlauts used in the city names are either plain or encoded for html. The problem i have, relates on the encoded umlauts. For example an "ä" is encoded "ä". For that reason my fulltext search phrase has to be prepared before using it with MySQL, but preparation like replacing all umlauts in the encoded form will result in a strange set of results from MySQL...
I would really appreciate, if someone could check and test and at least give me a hint for solving this problem.
Here are the two test statements which i am using at the moment:
/* Performing a fulltext search on tbl_plain_umlauts (working/result is only dataset with content_id = 4) */
SELECT * FROM tbl_plain_umlauts WHERE MATCH(content_text) AGAINST ('Köln' IN BOOLEAN MODE);
/* Performing a fulltext search on tbl_html_umlauts (not working/returns all city names which contain a single "ö" respectively "ö") */
SELECT * FROM tbl_html_umlauts WHERE MATCH(content_text) AGAINST ('Köln' IN BOOLEAN MODE);
I believe this comes because of the ampersand "&", but i do not know, how to get rid of this... Anyone got any idea? Please!
If you would like to help me, you could use the create script below:
--------------------------------------------------------------------
-- phpMyAdmin SQL Dump
-- version 2.11.8.1deb5+lenny4
--
http://www.phpmyadmin.net
--
-- Host: localhost
-- Erstellungszeit: 19. April 2010 um 15:50
-- Server Version: 5.0.51
-- PHP-Version: 5.2.6-1+lenny8
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
--
-- Datenbank: `fulltext_search`
--
CREATE DATABASE `fulltext_search` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
USE `fulltext_search`;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `tbl_html_umlauts`
--
CREATE TABLE IF NOT EXISTS `tbl_html_umlauts` (
`content_id` int(11) NOT NULL auto_increment,
`content_text` varchar(255) NOT NULL,
PRIMARY KEY (`content_id`),
FULLTEXT KEY `content_text` (`content_text`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
--
-- Daten für Tabelle `tbl_html_umlauts`
--
INSERT INTO `tbl_html_umlauts` (`content_id`, `content_text`) VALUES
(1, 'Berlin'),
(2, 'Hamburg'),
(3, 'München'),
(4, 'Köln'),
(5, 'Frankfurt am Main'),
(6, 'Stuttgart'),
(7, 'Dortmund'),
(8, 'Düsseldorf'),
(9, 'Essen'),
(10, 'Bremen'),
(11, 'Hannover'),
(12, 'Leipzig'),
(13, 'Dresden'),
(14, 'Nürnberg'),
(15, 'Duisburg'),
(16, 'Bochum'),
(17, 'Wuppertal'),
(18, 'Bielefeld'),
(19, 'Bonn'),
(20, 'Mannheim'),
(21, 'Karlsruhe'),
(22, 'Wiesbaden'),
(23, 'Münster'),
(24, 'Augsburg'),
(25, 'Gelsenkirchen'),
(26, 'Aachen'),
(27, 'Mönchengladbach'),
(28, 'Braunschweig'),
(29, 'Chemnitz'),
(30, 'Kiel'),
(31, 'Krefeld'),
(32, 'Halle (Saale)'),
(33, 'Magdeburg'),
(34, 'Freiburg im Breisgau'),
(35, 'Oberhausen'),
(36, 'Lübeck'),
(37, 'Erfurt'),
(38, 'Rostock'),
(39, 'Mainz'),
(40, 'Kassel'),
(41, 'Hagen'),
(42, 'Hamm'),
(43, 'Saarbrücken'),
(44, 'Mülheim an der Ruhr'),
(45, 'Herne'),
(46, 'Ludwigshafen am Rhein'),
(47, 'Osnabrück'),
(48, 'Solingen'),
(49, 'Leverkusen'),
(50, 'Oldenburg'),
(51, 'Potsdam'),
(52, 'Neuss'),
(53, 'Heidelberg'),
(54, 'Paderborn'),
(55, 'Darmstadt'),
(56, 'Regensburg'),
(57, 'Würzburg'),
(58, 'Ingolstadt'),
(59, 'Heilbronn'),
(60, 'Ulm'),
(61, 'Göttingen'),
(62, 'Wolfsburg'),
(63, 'Recklinghausen'),
(64, 'Pforzheim'),
(65, 'Offenbach am Main'),
(66, 'Bottrop'),
(67, 'Bremerhaven'),
(68, 'Fürth'),
(69, 'Remscheid'),
(70, 'Reutlingen'),
(71, 'Moers'),
(72, 'Koblenz'),
(73, 'Bergisch Gladbach'),
(74, 'Erlangen'),
(75, 'Trier'),
(76, 'Salzgitter'),
(77, 'Siegen'),
(78, 'Jena'),
(79, 'Hildesheim'),
(80, 'Cottbus'),
(81, 'Gera');
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `tbl_plain_umlauts`
--
CREATE TABLE IF NOT EXISTS `tbl_plain_umlauts` (
`content_id` int(11) NOT NULL auto_increment,
`content_text` varchar(255) NOT NULL,
PRIMARY KEY (`content_id`),
FULLTEXT KEY `content_text` (`content_text`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
--
-- Daten für Tabelle `tbl_plain_umlauts`
--
INSERT INTO `tbl_plain_umlauts` (`content_id`, `content_text`) VALUES
(1, 'Berlin'),
(2, 'Hamburg'),
(3, 'München'),
(4, 'Köln'),
(5, 'Frankfurt am Main'),
(6, 'Stuttgart'),
(7, 'Dortmund'),
(8, 'Düsseldorf'),
(9, 'Essen'),
(10, 'Bremen'),
(11, 'Hannover'),
(12, 'Leipzig'),
(13, 'Dresden'),
(14, 'Nürnberg'),
(15, 'Duisburg'),
(16, 'Bochum'),
(17, 'Wuppertal'),
(18, 'Bielefeld'),
(19, 'Bonn'),
(20, 'Mannheim'),
(21, 'Karlsruhe'),
(22, 'Wiesbaden'),
(23, 'Münster'),
(24, 'Augsburg'),
(25, 'Gelsenkirchen'),
(26, 'Aachen'),
(27, 'Mönchengladbach'),
(28, 'Braunschweig'),
(29, 'Chemnitz'),
(30, 'Kiel'),
(31, 'Krefeld'),
(32, 'Halle (Saale)'),
(33, 'Magdeburg'),
(34, 'Freiburg im Breisgau'),
(35, 'Oberhausen'),
(36, 'Lübeck'),
(37, 'Erfurt'),
(38, 'Rostock'),
(39, 'Mainz'),
(40, 'Kassel'),
(41, 'Hagen'),
(42, 'Hamm'),
(43, 'Saarbrücken'),
(44, 'Mülheim an der Ruhr'),
(45, 'Herne'),
(46, 'Ludwigshafen am Rhein'),
(47, 'Osnabrück'),
(48, 'Solingen'),
(49, 'Leverkusen'),
(50, 'Oldenburg'),
(51, 'Potsdam'),
(52, 'Neuss'),
(53, 'Heidelberg'),
(54, 'Paderborn'),
(55, 'Darmstadt'),
(56, 'Regensburg'),
(57, 'Würzburg'),
(58, 'Ingolstadt'),
(59, 'Heilbronn'),
(60, 'Ulm'),
(61, 'Göttingen'),
(62, 'Wolfsburg'),
(63, 'Recklinghausen'),
(64, 'Pforzheim'),
(65, 'Offenbach am Main'),
(66, 'Bottrop'),
(67, 'Bremerhaven'),
(68, 'Fürth'),
(69, 'Remscheid'),
(70, 'Reutlingen'),
(71, 'Moers'),
(72, 'Koblenz'),
(73, 'Bergisch Gladbach'),
(74, 'Erlangen'),
(75, 'Trier'),
(76, 'Salzgitter'),
(77, 'Siegen'),
(78, 'Jena'),
(79, 'Hildesheim'),
(80, 'Cottbus'),
(81, 'Gera');