Byte level storage seems right, display seems wrong
Posted by: Andy Theuninck
Date: February 25, 2016 11:35AM

I'm using PHP to insert records but I don't *think* this is a PHP issue. I've got a table named utfTable using utf8 and a file named latinTable using latin1. I'm inserting "á" characters and trying to understand what's happening.

I'm loading byte sequences from text files to [hopefully] avoid having PHP screw anything up. My latin1 source file contains one byte, 0xE1. My utf8 source file contains two bytes, 0xC3A1.

When I connect to the database via PHP and use SHOW VARIABLES, character_set_connection and character_set_client both report latin1.

First, I loaded the latin1 string from file and inserted it into both tables. Using HEX() in MySQL utfTable reports the value stored as 0xC3A1 and latinTable reports the value stored as 0xE1. This makes sense to me. The client sent the string in the encoding the server was expecting and the server translated it as needed to the table's character set.

Next, I loaded the utf8 string from file and inserted it into both tables. This time HEX() reports utfTable storing the value as 0xC383C2A1 and and latinTable reports the value stored as 0xC3A1. Again this mostly makes sense to me. The client sent the string in an unexpected encoding. Attempting to translate that non-latin1 string from latin1 to utf8 results in a weird value in utfTable and latinTable just takes the value as is.

Here's where I get confused: if I connect to the database via the CLI mysql client it also reports character_set_connection and character_set_client as latin1. But querying the tables seems exactly opposite of correct.

mysql> SELECT col, HEX(col) FROM utfTable \G
*************************** 1. row ***************************
col: {gray square}

hex(col): C3A1
*************************** 2. row ***************************
col: á

hex(col): C383C2A1

>mysql SELECT column, HEX(column) FROM latinTable \G
*************************** 1. row ***************************
col: {gray square}

hex(col): E1
*************************** 2. row ***************************
column: á

hex(column): C3A1

The values that seem to be stored correctly at byte level for each table's character set do not display correctly, and values that seem to be stored wrong at a byte level do display correctly. Is HEX() really showing me what's stored at a byte level? Could it be something else like my terminal app's character set that's responsible for the display issues?

Options: ReplyQuote


Subject
Views
Written By
Posted
Byte level storage seems right, display seems wrong
1918
February 25, 2016 11:35AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.