Re: How to correct corrupted UTF-8 characters
Posted by: James Cobban
Date: October 23, 2010 04:11PM

I went looking at some other posts on this forum and came across some suggestions of how to diagnose the issue.

As noted the table as created by SQLFront defines all of the component text and char fields as "latin1" even though I asked for utf-8 output.

James Cobban Wrote:
-------------------------------------------------------
>
> CREATE TABLE `tblLR` (
> `IDLR` int(10) NOT NULL AUTO_INCREMENT,
> `FSPlaceID` varchar(255) CHARACTER SET latin1
> DEFAULT NULL,
> `Preposition` varchar(120) CHARACTER SET latin1
> DEFAULT NULL,
> `Location` varchar(255) CHARACTER SET latin1
> DEFAULT NULL,
> `SortedLocation` varchar(255) CHARACTER SET
> latin1 DEFAULT NULL,
> `ShortName` varchar(255) CHARACTER SET latin1
> DEFAULT NULL,
> `Tag1` tinyint(3) unsigned DEFAULT NULL,
> `Used` tinyint(3) unsigned DEFAULT NULL,
> `Notes` longtext CHARACTER SET latin1,
> `Verified` tinyint(3) unsigned DEFAULT NULL,
> `Latitude` double(53,0) DEFAULT NULL,
> `Longitude` double(53,0) DEFAULT NULL,
> `FSResolved` tinyint(3) unsigned DEFAULT NULL,
> `VEResolved` tinyint(3) unsigned DEFAULT NULL,
> `qsTag` tinyint(3) unsigned DEFAULT NULL,
> PRIMARY KEY (`IDLR`)
> ) ENGINE=MyISAM AUTO_INCREMENT=32882 DEFAULT
> CHARSET=utf8
>

Following one of the debugging suggestions on another thread I issued the queries:

mysql> select location, hex(location), length(location), char_length(location) from tblLR where idlr=23319;

| N ½ E ½ lot 18 con ?, Adelaide, Middlesex, ON, CA | 4E20C2BD204520C2BD206C6F7420313820636F6E203F2C204164656C616964652C204D6964646C657365782C204F4E2C204341 | 51 | 51 |


mysql> select location, hex(location), length(location), char_length(location) from tblLR where idlr=23320;

| N Cor E � Lot 16 Con 4, Caradoc, Middlesex, ON, CA | 4E20436F72204520BD204C6F7420313620436F6E20342C2043617261646F632C204D6964646C657365782C204F4E2C204341 | 50 | 50 |

The first record has been manually updated from the web page, and therefore has the utf-8 text from the user. However the second record still has the latin1 text from the imperfect conversion. So now I have a mixed up database with some records in utf-8 and others in latin1 within the same field.

So I tried the following to change the definition of the field:

mysql> ALTER TABLE tblLR MODIFY Location VARCHAR(255) CHARACTER SET utf8;
Query OK, 13474 rows affected (3.32 sec)
Records: 13474 Duplicates: 0 Warnings: 0

mysql> show create table tblLR;
CREATE TABLE `tblLR` (
`IDLR` int(10) NOT NULL AUTO_INCREMENT,
...
`Location` varchar(255) DEFAULT NULL,
...
PRIMARY KEY (`IDLR`)
) ENGINE=MyISAM AUTO_INCREMENT=32882 DEFAULT CHARSET=utf8

mysql> select location, hex(location), length(location), char_length(location) from tblLR where idlr=23319;

| N ½ E ½ lot 18 con ?, Adelaide, Middlesex, ON, CA | 4E20C382C2BD204520C382C2BD206C6F7420313820636F6E203F2C204164656C616964652C204D6964646C657365782C204F4E2C204341 | 55 | 51 |

mysql> select location, hex(location), length(location), char_length(location) from tblLR where idlr=23320;

| N Cor E � Lot 16 Con 4, Caradoc, Middlesex, ON, CA | 4E20436F72204520C2BD204C6F7420313620436F6E20342C2043617261646F632C204D6964646C657365782C204F4E2C204341 | 51 | 50 |

Where the text already contained a UTF8 ½ as 0xC2BD that has now become two characters 0xC382 and 0xC2BD, but the first character displays nothing so the visual appearance of the text is still as expected. However in the second case where the 0xBD has been expanded to 0xC2BD as expected, the string continues to display with a gibberish character.

I don't understand.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: How to correct corrupted UTF-8 characters
3345
October 23, 2010 04:11PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.