Re: How to correct corrupted UTF-8 characters
Posted by: James Cobban
Date: October 23, 2010 03:37PM

I don't really understand your question. I don't know anything really about the MS Access database that I am converting from, or even about how MS Access stores text. I am assuming that it uses MS's private 16 bit character encoding that is similar to UTF-16 with the low order byte first, in which case the ½ should be internally represented as 0xbd00. But I don't know if that is the case because I don't really want to spend time learning MS Access. I used SQLFront 5.1 build 4.16 to convert the .mdb file to MySQL tables, specifying that I wanted UTF-8 output. SQLFront created tables such as:

CREATE TABLE `tblLR` (
`IDLR` int(10) NOT NULL AUTO_INCREMENT,
`FSPlaceID` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`Preposition` varchar(120) CHARACTER SET latin1 DEFAULT NULL,
`Location` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`SortedLocation` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`ShortName` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`Tag1` tinyint(3) unsigned DEFAULT NULL,
`Used` tinyint(3) unsigned DEFAULT NULL,
`Notes` longtext CHARACTER SET latin1,
`Verified` tinyint(3) unsigned DEFAULT NULL,
`Latitude` double(53,0) DEFAULT NULL,
`Longitude` double(53,0) DEFAULT NULL,
`FSResolved` tinyint(3) unsigned DEFAULT NULL,
`VEResolved` tinyint(3) unsigned DEFAULT NULL,
`qsTag` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`IDLR`)
) ENGINE=MyISAM AUTO_INCREMENT=32882 DEFAULT CHARSET=utf8

Despite the references to "latin1" the varchar and text fields all contain valid UTF-8. The only problem, as I said, is that the characters from the portion of page 0 with the high-order bit on were not translated from 0xhh to 0xC2hh for some reason. I don't really care why SQLFront translated the characters that way, I just want to search and replace all occurrences of the badly translated characters. I could write a program to do the search and replace, I was just hoping there was some simpler way to do it. I tried explicitly redefining the character set of one of the fields to UTF8, and MySQL said it had converted, but no change was made to the data.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: How to correct corrupted UTF-8 characters
2084
October 23, 2010 03:37PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.