Old Icelandic sort order: patch for ctype-uca.c
Posted by: Sven Müchert
Date: September 12, 2006 12:27PM

Hi!

I'm new here, so hi! :-)

I noticed that 'COLLATE utf8_icelandic_ci' does not account for Old Icelandic. The sort order for the additional characters of Old Icelandic should be:

a..d,ð,e..z,þ,æ,œ,ǫ,ø

Characters with macron e.g. ǭ, should probably be treated as if they had an acute accent, so they are different chars order after the unaccented character. The problem chars are œ and ǫ, which are currently treated like 'o' instead of sorted by the above order. The others are in correct order already by the Modern Icelandic order.

Sometimes, ę is used, and should be secondary after æ.

I searched the source code for the corresponding table and found it in strings/ctype-uca.c. There is a table 'icelandic' that I modified to read as follows:

static const char icelandic[]=
"& A < \\u00E1 <<< \\u00C1 << \\u0101 <<< \\u0100 "
"& D < \\u00F0 <<< \\u00D0 "
"& E << \\u0119 <<< \\u0118 " /* e << e ogonek */
"< \\u00E9 <<< \\u00C9 << \\u0113 <<< \\u0112 " /* < e acute << e macron */
"& I < \\u00ED <<< \\u00CD << \\u012B <<< \\u012A "
"& O < \\u00F3 <<< \\u00D3 << \\u014D <<< \\u014C "
"& U < \\u00FA <<< \\u00DA << \\u016B <<< \\u016A "
"& Y < \\u00FD <<< \\u00DD << \\u0233 <<< \\u0232 "
"& Z < \\u00FE <<< \\u00DE " /* thorn */
"< \\u00E6 <<< \\u00C6 " /* ae */
/* "<< \\u00E4\\u0301 <<< \\u00C4\\u0301 " e ogonek w/ acute: HOWTO? */
/* "<< \\u00E4\\u0304 <<< \\u00C4\\u0304 " e ogonek w/ macron: HOWTO? */
"<< \\u00E4 <<< \\u00C4 " /* a diaeresis */
"< \\u0153 <<< \\u0152 " /* oe */
"< \\u01EB <<< \\u01EA " /* o ogonek */
/* "< \\u01EB\\u0301 <<< \\u01EA\\u0301 " o ogonek + acute: HOWTO?
"<< \\u01ED <<< \\u01EC " o ogonek w/ macron */
"< \\u01ED <<< \\u01EC " /* o ogonek w/ macron */
"< \\u00F6 <<< \\u00D6 << \\u00F8 <<< \\u00D8 " /* o slash = o diaeresis */
"< \\u00E5 <<< \\u00C5 "; /* a ring */

This does not alter the original sort order but only add support for Old Icelandic. I test-compiled it and tested mysqld for correct order. One problem remains: I don't know how to add sort order for composite chars (ę with macron or acute and ǫ with acute). If I put \\u0301 or \\u304 behind the base chars, the table is rejected.

Anyway, the above is an improvement, I think.

I hope this is right place to post this patch?

Sven

Options: ReplyQuote


Subject
Views
Written By
Posted
Old Icelandic sort order: patch for ctype-uca.c
2897
September 12, 2006 12:27PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.