Hi!
I recently changed the collation/character set in the whole database for a project I'm working on to utf8. After a while I realized that REGEXP didn't work as it used to. I looked in the manual and saw that REGEXP was not multibyte safe. For example, if I have column called 'info' in a table 'news' and a row with the string "början" in colum 'info' and I would use these queries:
SELECT * FROM news WHERE REGEXP "början";
SELECT * FROM news WHERE REGEXP "BÖRJAN";
(I have simplified these queries so it would be more obvious, ordinarily the regexp's are much more complex, so I cannot for example use LIKE instead)
This used to give the same result, I would get the row containing the string "början" whatever case I used. However, when I changed the database to utf8 (from latin1) the second query wouldn't work. I can still do REGEXP searches with different case as long as the characters are non-"foreign" (åäö and so on).
I found this in
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html :
"REGEXP and RLIKE use the current character set (cp1252 Latin1 by default) when deciding the type of a character. Warning: These operators are not multi-byte safe."
Does this mean that REGEXP will only work with "foreign" characters if the column has a latin1 collation? Or does it mean that I can change the "current character set" to utf8 and REGEXP will work with "foreign" characters? Because I have tried that and it will still not work.
Everything is set to utf8: character_set_client, character_set_connection, character_set_database, character_set_results, character_set_server, character_set_system, and all collations are utf8. I have changed the MySQL-configuration (my.ini) so that default-character-set is utf8_general_ci (I have tried utf8_swedish_ci too) for both client and server. It wont work. But if I change collation only for the column I'm doing REGEXP's on to latin1 it works!
Am I forced to use latin1 if I want to do proper REGEXP selects? Has anyone else had this problem? I would be ever grateful for any help.
Thanks
/Oskar