Re: How to correct corrupted UTF-8 characters
I don't really understand your question. I don't know anything really about the MS Access database that I am converting from, or even about how MS Access stores text. I am assuming that it uses MS's private 16 bit character encoding that is similar to UTF-16 with the low order byte first, in which case the ½ should be internally represented as 0xbd00. But I don't know if that is the case because I don't really want to spend time learning MS Access. I used SQLFront 5.1 build 4.16 to convert the .mdb file to MySQL tables, specifying that I wanted UTF-8 output. SQLFront created tables such as:
CREATE TABLE `tblLR` (
`IDLR` int(10) NOT NULL AUTO_INCREMENT,
`FSPlaceID` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`Preposition` varchar(120) CHARACTER SET latin1 DEFAULT NULL,
`Location` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`SortedLocation` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`ShortName` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`Tag1` tinyint(3) unsigned DEFAULT NULL,
`Used` tinyint(3) unsigned DEFAULT NULL,
`Notes` longtext CHARACTER SET latin1,
`Verified` tinyint(3) unsigned DEFAULT NULL,
`Latitude` double(53,0) DEFAULT NULL,
`Longitude` double(53,0) DEFAULT NULL,
`FSResolved` tinyint(3) unsigned DEFAULT NULL,
`VEResolved` tinyint(3) unsigned DEFAULT NULL,
`qsTag` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`IDLR`)
) ENGINE=MyISAM AUTO_INCREMENT=32882 DEFAULT CHARSET=utf8
Despite the references to "latin1" the varchar and text fields all contain valid UTF-8. The only problem, as I said, is that the characters from the portion of page 0 with the high-order bit on were not translated from 0xhh to 0xC2hh for some reason. I don't really care why SQLFront translated the characters that way, I just want to search and replace all occurrences of the badly translated characters. I could write a program to do the search and replace, I was just hoping there was some simpler way to do it. I tried explicitly redefining the character set of one of the fields to UTF8, and MySQL said it had converted, but no change was made to the data.