Re: MySQLAdministrator / MySQLDump generates ANSI instead of UTF8 - why ?
Posted by: Rick James
Date: September 27, 2009 12:51PM

Files do not have an encoding associated with them. The best that programs can do is to read the file and make a guess.

True, ASCII encodings are as subset of UTF8 encodings.

The UNICODE encoding is probably a 16-bit way of storing strings -- two bytes per 'character'. It is NOT "utf8", but it can (mostly) be easily converted do/from utf8.

Utf8 uses 1-3 bytes (sometimes more) per character. ("Byte" is an 8-bit thingie; 'character' has no particular size or encoding.) English text takes 1 byte per character in utf8; Chinese takes 3 (usually). (But then, a Chinese character often represents a whole 'word'.)

The .sql file you generated is just a bunch of bits. If the writer faithfully encoded stuff in utf8 to write to it, and the reader assumes utf8, then all is well. But it is up to you to be sure the writer and reader both have the same SET NAMES (or whatever); there is nothing in the file, itself, nor in the directory for the file that says "this is utf8-encoded". (Files and file systems were invented decades ago by English-speaking people. Even ASCII encoding took a couple of decades to be invented. In the early days NO machine had even a megabyte of memory; characters were crammed into 5 or 6 bits as a cost savings. Moving to 8-bit bytes raised the price for storing English text by a huge amount.)

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: MySQLAdministrator / MySQLDump generates ANSI instead of UTF8 - why ?
3498
September 27, 2009 12:51PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.