MySQL Forums
Forum List  »  General

Re: Northwind DB and character encoding
Posted by: Steve Claflin
Date: May 08, 2014 07:11AM

Thanks for the helpful info, particularly the link.

I can now get to my expected result, but am still not sure of exactly what's going on.

One issue with my first cut was that I had run mysqld directly from my MySql bin directory. But, then I took my command prompt to my directory with the sql file, and ran mysql from there. It turns out that my system path pointed to an earlier version of MySQL, which produced different results from what I've put below, run with the correct version of mysql.exe.

SQL file values as shown in a hex editor:

Company Name: Berglunds snabbköp 4265726C756E6473206E6162626BF6
Address: Berguvsvägen 4265726775737676E467656E
City: Luleå 4C756C65E5


Concentrating on the city, from what I understand, E5 is the correct code for the lowercase "a with circle"

The SQL file creates the database this way:

CREATE DATABASE Northwind DEFAULT CHARACTER SET latin1;

The following test scenarios were run in a command prompt window:


I.
--------------------------

chcp said code page was 437 - left it that way

No set names used

Ran the sql file

City looks wrong, and hex has wrong value:
| Luleσ | 4C756C65D5 |

Deleted row, pasted in insert statement from file opened in Crimson Editor

City looks right, and has expected hex value:
| Luleå | 4C756C65E5 |


II.
--------------------------
Did not run chcp

SET NAMES latin1;

Ran the sql file

City looks wrong, but hex has right value:
| Luleσ | 4C756C65E5 |

Deleted row, pasted in insert statement from file opened in Crimson Editor

City looks right, but has wrong hex value:
| Luleå | 4C756C6586 |

III.
--------------------------
Ran: chcp 850 first

SET NAMES latin1;

Ran the sql file

City looks wrong in a different way, but hex has right value:
| LuleÕ | 4C756C65E5 |

Deleted row, pasted in insert statement from file opened in Crimson Editor

City looks right, but has wrong hex value:
| Luleå | 4C756C6586 |

IV.
--------------------------
Ran: chcp 1252 first

SET NAMES latin1;

Ran the sql file

City looks wrong in command window, but hex has right value, and when copied and pasted, the city actually looks right (on my screen in command window it'w the lowercase omega-looking character - lowercase o with tilde-like squiggle):
| Luleå | 4C756C65E5 |

Deleted row, pasted in insert statement from file opened in Crimson Editor

At least this is now the same as from source command:
| Luleå | 4C756C65E5 |


So it looks like for running the sql file, Scenario IV is the best so far. I would then put SET NAMES latin1; at the top of the SQL file. It would be nice if I could then run queries in the command window that would display correctly, but I can live without that.

For what it's worth, the DB is eventually being displayed by a JSP, in which we use:
<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="iso-8859-1"%>

And the E5 character displays correctly. I believe that the pageEncoding specifies the encoding of the JSP itself, which would include the latin1 output from MySQL, and that the contentType part causes a JSP to convert to utf-8.

Options: ReplyQuote


Subject
Written By
Posted
Re: Northwind DB and character encoding
May 08, 2014 07:11AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.