Old Icelandic sort order: patch for ctype-uca.c
Hi!
I'm new here, so hi! :-)
I noticed that 'COLLATE utf8_icelandic_ci' does not account for Old Icelandic. The sort order for the additional characters of Old Icelandic should be:
a..d,ð,e..z,þ,æ,œ,ǫ,ø
Characters with macron e.g. ǭ, should probably be treated as if they had an acute accent, so they are different chars order after the unaccented character. The problem chars are œ and ǫ, which are currently treated like 'o' instead of sorted by the above order. The others are in correct order already by the Modern Icelandic order.
Sometimes, ę is used, and should be secondary after æ.
I searched the source code for the corresponding table and found it in strings/ctype-uca.c. There is a table 'icelandic' that I modified to read as follows:
static const char icelandic[]=
"& A < \\u00E1 <<< \\u00C1 << \\u0101 <<< \\u0100 "
"& D < \\u00F0 <<< \\u00D0 "
"& E << \\u0119 <<< \\u0118 " /* e << e ogonek */
"< \\u00E9 <<< \\u00C9 << \\u0113 <<< \\u0112 " /* < e acute << e macron */
"& I < \\u00ED <<< \\u00CD << \\u012B <<< \\u012A "
"& O < \\u00F3 <<< \\u00D3 << \\u014D <<< \\u014C "
"& U < \\u00FA <<< \\u00DA << \\u016B <<< \\u016A "
"& Y < \\u00FD <<< \\u00DD << \\u0233 <<< \\u0232 "
"& Z < \\u00FE <<< \\u00DE " /* thorn */
"< \\u00E6 <<< \\u00C6 " /* ae */
/* "<< \\u00E4\\u0301 <<< \\u00C4\\u0301 " e ogonek w/ acute: HOWTO? */
/* "<< \\u00E4\\u0304 <<< \\u00C4\\u0304 " e ogonek w/ macron: HOWTO? */
"<< \\u00E4 <<< \\u00C4 " /* a diaeresis */
"< \\u0153 <<< \\u0152 " /* oe */
"< \\u01EB <<< \\u01EA " /* o ogonek */
/* "< \\u01EB\\u0301 <<< \\u01EA\\u0301 " o ogonek + acute: HOWTO?
"<< \\u01ED <<< \\u01EC " o ogonek w/ macron */
"< \\u01ED <<< \\u01EC " /* o ogonek w/ macron */
"< \\u00F6 <<< \\u00D6 << \\u00F8 <<< \\u00D8 " /* o slash = o diaeresis */
"< \\u00E5 <<< \\u00C5 "; /* a ring */
This does not alter the original sort order but only add support for Old Icelandic. I test-compiled it and tested mysqld for correct order. One problem remains: I don't know how to add sort order for composite chars (ę with macron or acute and ǫ with acute). If I put \\u0301 or \\u304 behind the base chars, the table is rejected.
Anyway, the above is an improvement, I think.
I hope this is right place to post this patch?
Sven
Subject
Views
Written By
Posted
Old Icelandic sort order: patch for ctype-uca.c
2897
September 12, 2006 12:27PM
1976
September 20, 2006 12:05PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.