Re: modify a character set file: add & and - as word characters
Posted by: Duane Hitz
Date: October 12, 2006 12:07AM

I would like a response to this as well...

We are having issues because of significant misses on searches because "-" is not considered a character. We have many, many coded "words" IO-30, CO-21, etc that need to be searched on.

I am assuming that the <ctype><map> section defines what is and what is not a character. By mapping the lower and upper maps, we get that "01" indicates a character... all but for a-g and A-G which are "81"... presumably an indication that these are "lucky" letters, since in Chinese numerology, the number 8 means wealth.

And, the mysterious "00" on its own line must be some sort of odd confirmation of Gödels incompleteness theorem... or an attempt at an infinity symbol - since it would take someone (ok, maybe just me) infinitely long to decode the character set mapping given the total lack of documentation of these files (at least that I could find after two hours of searching) along with my unequaled lack of patience. Call it Duane's incompleteness theorem.

I changed the positional mapping of "2D" to a "01" in the <ctype><map> section of the Latin1.xml file. I shut down MySQL (5.0.16-nt), dropped and rebuilt the fulltext index, and viola! No results. When doing a brute force LIKE '%string%' on the same string, I get 13 rows.

The table's character set is Latin1 and I even went as far as dropping and forcing the column character set to Latin1 (not that it made a difference).

The cryptic "Then use the given character set for your FULLTEXT indexes." is somewhat disturbing, as if there were a powerful but undocumented "create fulltext index blah on table blahblah oh and by the way use character set Latin1" syntax.

Anyone have any idea on where someone might have actually documented this stuff?

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.