UTF-8 vs UCS-2 (especially on ndb)
Posted by: Mirko Raner
Date: April 10, 2012 06:53PM

I'm in the process of migrating a InnoDB database to ndb, and we're frequently running into the 14000-byte row size limit imposed by ndb. There are quite a number of VARCHAR columns in our DB, and we're using the utf8 character set.
My question is: what (if any) is the benefit of using UTF-8 over UCS-2?
Since VARCHAR needs to allocate memory for the worst-case scenario (i.e., the maximum length) VARCHARs in UTF-8 require 3 times the length, whereas UCS-2 only requires 2 times the length. UTF-8 is optimized for scenarios with mainly one- and two-byte characters, but if the storage mechanism has to assume three bytes anyway, UCS-2 seems to be the better choice.
Am I overlooking something here? It seems like using UTF-8 for VARCHARs is a waste of space (especially problematic for MySQL Cluster with the smaller row memory limit of 14000 bytes).
Any insights?

Options: ReplyQuote


Subject
Views
Written By
Posted
UTF-8 vs UCS-2 (especially on ndb)
3940
April 10, 2012 06:53PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.