Weird utf-8 encoding problem using webserver that does not occur in command line client
Posted by: Stefan Helders
Date: June 29, 2009 02:52AM

Hello all,

I am facing a weird problem with utf-8 encoding for which I have not found any solution yet on the forum (even after studying the chapter 9.1 of the manual and the sticky threads about this topic).

The system configuration is as follows:

Windows XP SP2
Apache: 2.2.11
PHP: 5.2.10
MySQL: 5.1.35 (using MyISAM tables)

(This problem did not appear with Apache 1.4, PHP 4.3, MySQL 4.1)

MySQL environment variables concerning character set according to chapter "9.1 Character Set Support" of the manual:

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+

The problem is that an sql query that ought to display the same character in two different ways actually displays two different characters. The weird thing is that the problem only appears if the sql query is issued via webserver. If it is issued via MySQL command line client, the characters that are displayed do not differ, as you can see.


mysql> select CHAR(50603), substr(name,5,1) from names where nameid=19028;
+-------------+------------------+
| CHAR(50603) | substr(name,5,1) |
+-------------+------------------+
| +½ | +½ |
+-------------+------------------+
1 row in set (0.00 sec)

Also in the database table itself the character appears correctly. I have checked that with a hex editor.

Using PHP/Apache the character is reduced to a question mark if is read out from the table. Here is the result:

pure character: ū, length: 2, code: 50603
pure character: ?, length: 1, code: 63

The code snippet in PHP is as follows:

$sql="select CHAR(50603), substr(name,5,1) from names where nameid=19028";
$result=mysql_query($sql); $err=mysql_error();
if(!empty($err)) print("Error in SQL $sql: $err");
if($row=mysql_fetch_array($result,MYSQL_NUM)) {
$char1=$row[0]; $char2=$row[1];
$len1=strlen($row[0]); $len2=strlen($row[1]);
if($len1==1) $ord1=ord($char1); else $ord1=ord($char1)*256+ord(substr($char1,1));
if($len2==1) $ord2=ord($char2); else $ord2=ord($char2)*256+ord(substr($char2,1));
print("pure character: $char1, length: $len1, code: $ord1<br>\n");
print("pure character: $char2, length: $len2, code: $ord2<br>\n");
}

Since it is possible to display the character correctly when selecting it directly, I would exclude that it is a problem of PHP or Apache.

The questions now are:
Why is the character reduced if it is read out from the table?
How is it possible to avoid this reduction?

Imposing utf-8 character set on the database/table/columns does not change anything.

Any ideas?

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.