Re: Northwind DB and character encoding
Thanks for the helpful info, particularly the link.
I can now get to my expected result, but am still not sure of exactly what's going on.
One issue with my first cut was that I had run mysqld directly from my MySql bin directory. But, then I took my command prompt to my directory with the sql file, and ran mysql from there. It turns out that my system path pointed to an earlier version of MySQL, which produced different results from what I've put below, run with the correct version of mysql.exe.
SQL file values as shown in a hex editor:
Company Name: Berglunds snabbköp 4265726C756E6473206E6162626BF6
Address: Berguvsvägen 4265726775737676E467656E
City: Luleå 4C756C65E5
Concentrating on the city, from what I understand, E5 is the correct code for the lowercase "a with circle"
The SQL file creates the database this way:
CREATE DATABASE Northwind DEFAULT CHARACTER SET latin1;
The following test scenarios were run in a command prompt window:
I.
--------------------------
chcp said code page was 437 - left it that way
No set names used
Ran the sql file
City looks wrong, and hex has wrong value:
| Luleσ | 4C756C65D5 |
Deleted row, pasted in insert statement from file opened in Crimson Editor
City looks right, and has expected hex value:
| Luleå | 4C756C65E5 |
II.
--------------------------
Did not run chcp
SET NAMES latin1;
Ran the sql file
City looks wrong, but hex has right value:
| Luleσ | 4C756C65E5 |
Deleted row, pasted in insert statement from file opened in Crimson Editor
City looks right, but has wrong hex value:
| Luleå | 4C756C6586 |
III.
--------------------------
Ran: chcp 850 first
SET NAMES latin1;
Ran the sql file
City looks wrong in a different way, but hex has right value:
| LuleÕ | 4C756C65E5 |
Deleted row, pasted in insert statement from file opened in Crimson Editor
City looks right, but has wrong hex value:
| Luleå | 4C756C6586 |
IV.
--------------------------
Ran: chcp 1252 first
SET NAMES latin1;
Ran the sql file
City looks wrong in command window, but hex has right value, and when copied and pasted, the city actually looks right (on my screen in command window it'w the lowercase omega-looking character - lowercase o with tilde-like squiggle):
| Luleå | 4C756C65E5 |
Deleted row, pasted in insert statement from file opened in Crimson Editor
At least this is now the same as from source command:
| Luleå | 4C756C65E5 |
So it looks like for running the sql file, Scenario IV is the best so far. I would then put SET NAMES latin1; at the top of the SQL file. It would be nice if I could then run queries in the command window that would display correctly, but I can live without that.
For what it's worth, the DB is eventually being displayed by a JSP, in which we use:
<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="iso-8859-1"%>
And the E5 character displays correctly. I believe that the pageEncoding specifies the encoding of the JSP itself, which would include the latin1 output from MySQL, and that the contentType part causes a JSP to convert to utf-8.