CHAR_LENGTH() returns incorrect value on Japanese UTF8 text
Posted by: Gregor Kaplan
Date: August 23, 2007 10:31AM

Execute the following:

CREATE TABLE multibyte
(
thing VARCHAR(20) CHARACTER SET utf8
);

INSERT INTO multibyte (thing) VALUES('human');
INSERT INTO multibyte (thing) VALUES('ははは'); #if you can't read this it's "hahaha" in hiragana

SELECT thing, CHAR_LENGTH(thing), LENGTH(thing) FROM multibyte;

Result:
+-----------+--------------------+---------------+
| thing | CHAR_LENGTH(thing) | LENGTH(thing) |
+-----------+--------------------+---------------+
| human | 5 | 5 |
| ははは | 9 | 18 |
+-----------+--------------------+

The return value should be 3, as that is the actual number of UTF8 characters. However, if one were to open the file in a plain text editor, it would appear as "„ÅØ„ÅØ„ÅØ" - which is 9 characters, just not 9 meaningful characters. That and they certainly shouldn't have a length of 18.

So, is this a bug, or is there a correct way to call this that I am simply missing?

Much thanks and appreciation in advance,

Gregor

Options: ReplyQuote


Subject
Views
Written By
Posted
CHAR_LENGTH() returns incorrect value on Japanese UTF8 text
3847
August 23, 2007 10:31AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.