Cannot get mysqldump to produce utf-8 encoded files!
I cannot backup my utf-8 encoded database to an utf-8 encoded dumpfile, thus also not being able to restore the database without ending up with broken special characters. I have been searching google and this forum extensively without finding a solution to this problem.
I have a database in which all tables are created with:
CREATE TABLE `name` (
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
All data inserted into the database is done by PHP. To ensure only utf-8 encoded data is inserted, I use SET NAMES 'utf8' upon every connection and have these settings in php.ini:
mbstring.func_overload = 7
mbstring.internal_encoding = UTF-8
mbstring.http_input = auto
For my.cnf I use:
character-set-server=utf8
default-collation=utf8_unicode_ci
default-character-set=utf8
collation-server=utf8_unicode_ci
Despite this, running:
$ mysqldump -uroot -hlocalhost the_db_name -r supposed_utf8_dumpfile --verbose -p --default-character-set=utf8
Returns a dumpfile with the special chars like äöå all jarbled up. That the file is not UTF-8 encoded is confirmed by runing:
$ enca -L none supposed_utf8_dumpfile
Which replies: Unrecognized encoding
Also, viewing with less shows the jarbled characters where öåäö is supposed to be. ( ä = ä , ö = ö, Ö = Ö etc)
iconv-conversion from latin1, cp1252 or any common iso-format will not work, most propable because of the encoding not being any of those encodings from the beginning...
The dumpfile's top lines are as follows:
-- MySQL dump 10.10
--
-- Host: localhost Database: the_db_name
-- ------------------------------------------------------
-- Server version 5.0.27-Debian_0.dotdeb.1-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
The system is Ubuntu Dapper server with locale: en_US.UTF-8. Version of mysql is: "mysql Ver 14.12 Distrib 5.0.27, for pc-linux-gnu (i386) using readline 4.3"
Some related information:
"mysqldump --help | grep char" shows:
default-character-set utf8
set-charset TRUE
"mysql --help | grep char" yields:
default-character-set latin1
"mysql --default-character-set=utf8 --help | grep char" yields:
default-character-set utf8
"mysql -uroot -hlocalhost -p
> SHOW VARIABLES; " yields (amongst others of course):
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | latin1_swedish_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
"mysql -uroot -hlocalhost -p
> SET NAMES 'utf8';
> SHOW VARIABLES;" yields:
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8_general_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
(Btw - why is collation_connection=utf8_general_ci and not utf8_unicode_ci?)
I feel that I have done most of what can be expected to solve this problem, except for reverting to hackish php-scripts or other software to do what mysqldump is supposed to do.
Please shed a light on this subject - what am I missing?
EDIT: Just to make things clear. I can do:
$ echo "test utf8 åäö" > testutf8
$ enca -L none testutf8
..and get the reply: Universal transformation format 8 bits; UTF-8
I can also view the contents of testutf8 with less without jarbled characters. This type of properties is what I expect the dumpfile to have.
Subject
Views
Written By
Posted
Cannot get mysqldump to produce utf-8 encoded files!
14801
December 06, 2006 02:39AM
6341
May 03, 2007 06:09AM
10703
July 28, 2007 07:20AM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.