Hi everybody!
We are using mysql5.0 in production and we are located in Croatia. Few months ago we had to make a switch to utf8. We switched from latin2_croatian_ci to utf8_general_ci - the general utf8 charset and collation.
Offcourse, order by is returning nonsense since mysql orders all our national letters with same weight.
Many months have passed, but solution is still unavailable. Closest to our collation is utf8_slovenian_ci, but:
1. slovenian sees "lj", "nj" and "dž" as two letters, and in croatian they are single letters
2. slovenian doesn't support letter "ć"
... so order by still returns badly sorted values.
We can make some wizardry with converting only single field (on which we will order by) into latin2_croatian_ci and having rest of DB in utf8_general_ci. Documentation on convert() is pretty slim (
http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html). Other ever worse wizardry is making new field in which we will make our pseudo language and replace our letters in following way:
c = c
ć = cx
č = cxx
...
This is very bad in long run.
The question is... How can we make our own collation for Croatian language?
What to do? Where to go? :)
Thank you all in advance!
---
update: changing the source and recompiling mysql isn't so compelling. :) Instead I've found this - /usr/share/mysql/charsets/Index.xml . And example of Vietnamese experimental collation. Am I on a right track?
Best regards
Neven
---
seven | the witchdoctor
http://www.nivas.hr - uploading 24/7!
Edited 2 time(s). Last edit at 01/17/2008 02:09PM by Neven J.