Re: Fixing Double Enocoding With mysqldump
Posted by: Peter Berry
Date: October 23, 2011 03:07PM

Hey Rick -

This site in question is built with CakePHP. Just as you describe, when reading back and displaying data on web pages the process reverses and the characters show up properly as utf8. The only reason I noticed was that searching for plain characters was not returning the accented versions even though the collation is utf8_general_ci. This makes perfect sense with double encoding as those characters are not actually utf8 in the database (or rather they are improperly encoded as different characters).

There is an optional 'encoding' parameter that you pass to CakePHP, but the template that comes with the framework doesn't set it and I stupidly didn't add it. I think my mistake was assuming it would default to the charset for the database rather than than server, but it's just asking for trouble not to set it explicitly.

The encoding issue is site wide, but here is the output of SHOW CREATE TABLE for a typical table:

CREATE TABLE `addresses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`address_label_lookup_id` int(11) DEFAULT NULL,
`contact_id` int(11) NOT NULL,
`organization` varchar(250) DEFAULT NULL,
`department` varchar(250) DEFAULT NULL,
`title` varchar(250) DEFAULT NULL,
`first_name` varchar(100) DEFAULT NULL,
`last_name` varchar(100) DEFAULT NULL,
`address_line1` varchar(250) NOT NULL,
`address_line2` varchar(250) DEFAULT NULL,
`postal_code` varchar(100) NOT NULL,
`city` varchar(100) NOT NULL,
`state_id` int(11) DEFAULT NULL,
`state_other` varchar(100) DEFAULT NULL,
`country_id` int(11) DEFAULT NULL,
`country_other` varchar(100) DEFAULT NULL,
`primary_address` tinyint(1) NOT NULL,
`deleted` tinyint(1) NOT NULL,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`created_by_user_id` int(11) NOT NULL,
`modified_by_user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_addresses_city` (`city`),
KEY `idx_addresses_state_id` (`state_id`),
KEY `idx_addresses_country_id` (`country_id`),
KEY `idx_addresses_contact_id` (`contact_id`),
KEY `idx_addresses_postal_code` (`postal_code`)
) ENGINE=InnoDB AUTO_INCREMENT=6357 DEFAULT CHARSET=utf8;

And here is the (edited for format) output of SHOW VARIABLES LIKE 'char%';

character_set_client = latin1
character_set_connection = latin1
character_set_database = utf8
character_set_filesystem = binary
character_set_results = latin1
character_set_server = latin1
character_set_system = utf8
character_sets_dir = c:\devtools\wamp\bin\mysql\mysql5.1.36\share\charsets\

The connection is made via a dedicated user, not root, and the web server process is also owned by a dedicated user.

Actually, for this site there are so many places things could have gone wrong I'm surprised it's not worse. Most of the data was originally exported from Filemaker, imported to temporary MySQL tables via phpMyAdmin, and then processed via php scripts running as part of the web app (using the above mentioned default charset of latin1). Also there are the 3 different server environments... but I digress.

I'm going to do more rigorous test of this fix this week. I'll post the results.

Thanks again for all the information here, it's been a huge help.

Options: ReplyQuote


Subject
Views
Written By
Posted
5417
October 20, 2011 04:59PM
Re: Fixing Double Enocoding With mysqldump
2255
October 23, 2011 03:07PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.