Fixed: Control Characters
Posted by: Bambarbia Kirkudu
Date: May 23, 2007 11:45AM

I probably located the problem.

Some control characters are not included into ISO-8859-1, "ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP".

ISO/IEC 8859-1 includes control characters: \x80 is "Padding Character" and should not be a part of serialized text!!! (I convert ISO-8859-1 to Java String, and send it to MySQL utf8)

http://en.wikipedia.org/wiki/ISO_8859-1

"Bug" report: http://bugs.mysql.com/bug.php?id=28263

P.S.
"Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not a standard behaviour and care should be taken to avoid it."
http://en.wikipedia.org/wiki/ISO_8859-1

P.P.S.
I can use binary type instead of TEXT; less memory, no any conversion to 3-byte UTF-8



Edited 1 time(s). Last edit at 05/23/2007 11:46AM by Bambarbia Kirkudu.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.