Re: ucs-2 server - yes. usc-2 client - no. What the ....?
Posted by: Mike Lischke
Date: January 20, 2006 03:08AM

Cristi Ionescu wrote:

> From what I've found out, UTF16 uses more bytes
> than utf8 in storing the characters.

Depends. UTF 16 uses two bytes per characters, except for surrogates (http://www.unicode.org/faq/utf_bom.html#34) where one characters is created from two word values. But surrogates aren't used that much and there is no assigned codepoint in the surrogates area (assigned by the Unicode consortium) so one can safely assume UTF-16 almost always uses a word value to store a code point. UTF-8 however encodes a character by a variable number of bytes. This can be only one byte for characters form the latin area (particularly those also covered in the ASCII character set, which is by intention to make this interoparable and migrations smooth). This can however also be 4 bytes for certain characters. For a comparation of the encodings read http://www.unicode.org/faq/utf_bom.html#37.

> I am interested too in the utf16 vs utf8 problem
> and the supporting of utf16 in mysql,

Actually, MySQL *does* support UTF-16 however it is named UCS-2 (read here for the differences: http://www.unicode.org/faq/basic_q.html#23). You can find all supported character sets on this page: http://dev.mysql.com/doc/refman/5.0/en/charset-charsets.html.

Mike

Mike Lischke, MySQL Developer Tools
Oracle Corporation

MySQL Workbench on Github: https://github.com/mysql/mysql-workbench
On Twitter: https://twitter.com/MySQLWorkbench
On Slack: mysqlcommunity.slack.com (#workbench)
Report bugs to http://bugs.mysql.com
MySQL documentation can be found here: http://dev.mysql.com/doc/refman/8.0/en/

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: ucs-2 server - yes. usc-2 client - no. What the ....?
2346
January 20, 2006 03:08AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.