MySQL Forums
Forum List  »  Newbie

Re: Is STR_TO_DATE function doesn't work with UCS2 charset?
Posted by: Rick James
Date: March 05, 2013 10:56PM

Notice how some functions in
http://dev.mysql.com/doc/refman/5.5/en/string-functions.html
mention "multi-byte safe".
Perhaps they are the only functions that are multi-byte safe.

Ascii, latin1, and a few other character sets have 1 _byte_ per _character_. ucs2 has 2 or 4 bytes, never 1, per character. utf8 has 1-3 bytes per character.

The byte-encoding of characters in English text (letters, numbers, punctuation) is identical for ascii, latin1, and utf8. STR_TO_DATE() works with a byte stream and is expecting that encoding. In particular, ucs2's 2-byte encoding for a digit confuses it.

Generic string manipulation functions (LEFT, MID, ...) deliberately work with 'characters', hence they go to the effort of honoring the CHARACTER SET.

LENGTH() and CHAR_LENGTH() give you byte count and character count, respectively.

Functions (especially date-related) that look at the actual characters seem (as you found out) to be 'dumb' in that they do not honor the CHARACTER SET, but simply look at the bytes. However, since virtually all uses of MySQL use ascii, latin%, utf8, or utf8mb4, most users do not get into the trouble you encountered.

More discussion:
http://mysql.rjweb.org/doc.php/charcoll

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.