Re: Sure: Bug with Encoding: \x80 \x81 ...
Posted by: Mark Matthews
Date: May 30, 2007 01:27PM

Bambarbia Kirkudu Wrote:
-------------------------------------------------------
[snip]
> How it happens? Easy. I have a bytearray. I
> convert it to a Java String using windows-1252,
> new String(byte[], "windows-1252"). Windows-1252
> is unapproved superset of ISO-8859-1. "it's better
> to treat ISO-8859-1 as synonymous with
> windows-1252 than to reject, as invalid, documents
> labelled as ISO-8859-1 that have characters
> outside ISO-8859-1"
>
> Byte array probably has character \x80 which is
> control character in ISO. Even if it has... It is
> 'euro sign' in Windows 1252. Java String should
> create 'euro sign' in Unicode; JDBC should send it
> 'as is' to MySQL (am I right?) and 'as is' in this
> case should not contain \x80 because Euro Sign has
> different code in Unicode!!!

What character encoding is your connection set to use? What character encoding is your table and/or columns in the table? There's not such concept for "as-is", the string has to be sent to the server in an encoding. If it can't be represented in that encoding, or the column can not store the encoding you will get an error from the server.

> So, for sure, it is a bug in JDBC. JDBC does some
> character conversion, and MySQL utf-8 is not the
> same as Java String.

Java strings aren't in UTF-8 either. They're in Unicode. There is a difference. In any case if you stay within the BMP, there is no difference between MySQL's UTF-8 and Java's UTF-8.

-Mark

Mark Matthews
Consulting Member Technical Staff - MySQL Enterprise Tools
Oracle
http://www.mysql.com/products/enterprise/monitor.html

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.