Collation utf8_swedish_ci
Posted by: Pär Ågren
Date: June 09, 2005 03:58AM

I have a problem the swedish/finnish collation that does not behave like it should. I'm developing a phone book index (sort of) in which the user should be able to enter a few characters and browse through the index.

For this I've decided to use the handler statement rather than select.

I use the statement

handler read ndx>=(1,'ågren') limit 0,100

"ndx" is a composite index where the first column is a smallint that contains "browse type" (which could be a personal name, a subject heading, a institutional name, etc) and the second column is the data to browse.

When executing the above statement, all rows where the column starts with ågren are indeed returned, but followed buy names starting with 'agr' (Agricola, etc) which is not what the swedish/finnish user would want. They expect to see other names starting with å (problaby 'åh') here.

The table is created with charset utf8 and collation utf8_swedish_ci, as is the index ndx.

I haven't tried to skip the composite index and just scan the data column, but I would prefer not have to do some workaround here, because performance is essential. Ideally the list of names return should change as the user types. There will be a lot of rows in this table (probably 10-15 million).

Testing it with various select statements in MySQL Query Browser returns confusing results.
Not appending any collate instruction in the select, however, returns rows in the same erratic way. I'm sorry I cannot be more specific here, but some results are empty when they should return rows, and some variations with collate actually do return rows correctly but are very slow. I really have no idea what is going on.

An earlier question about the swedish/finnish collation which pointed to the swedish Unicode collation for for the mimer database, which seem correct. The answer was that MySql used the same settings, but did not treat v and w as the same character, which I can live with.

One would think that if one set an index to a certain collation when creating it would mean that the BTREE would be built using that collation and that the handler statement would retrieve record in physical index order regardless of any character set setting, whether on server, connection or client level.


Options: ReplyQuote

Written By
Collation utf8_swedish_ci
June 09, 2005 03:58AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.