Re: Sure: Bug with Encoding: \x80 \x81 ...
Bambarbia Kirkudu Wrote:
-------------------------------------------------------
[snip]
> How it happens? Easy. I have a bytearray. I
> convert it to a Java String using windows-1252,
> new String(byte[], "windows-1252"). Windows-1252
> is unapproved superset of ISO-8859-1. "it's better
> to treat ISO-8859-1 as synonymous with
> windows-1252 than to reject, as invalid, documents
> labelled as ISO-8859-1 that have characters
> outside ISO-8859-1"
>
> Byte array probably has character \x80 which is
> control character in ISO. Even if it has... It is
> 'euro sign' in Windows 1252. Java String should
> create 'euro sign' in Unicode; JDBC should send it
> 'as is' to MySQL (am I right?) and 'as is' in this
> case should not contain \x80 because Euro Sign has
> different code in Unicode!!!
What character encoding is your connection set to use? What character encoding is your table and/or columns in the table? There's not such concept for "as-is", the string has to be sent to the server in an encoding. If it can't be represented in that encoding, or the column can not store the encoding you will get an error from the server.
> So, for sure, it is a bug in JDBC. JDBC does some
> character conversion, and MySQL utf-8 is not the
> same as Java String.
Java strings aren't in UTF-8 either. They're in Unicode. There is a difference. In any case if you stay within the BMP, there is no difference between MySQL's UTF-8 and Java's UTF-8.
-Mark
Mark Matthews
Consulting Member Technical Staff - MySQL Enterprise Tools
Oracle
http://www.mysql.com/products/enterprise/monitor.html