Confused about UTF8
Hi,
For a simple music database (artist, album, track) I want to enter UTF8 strings. But I can't get it to function properly. That is, I don't see what I expect.
I wrote a test script in Python as follows (just the important details). When I run that script it inserts an example record in the table. And when I look at the result with the command "mysql --execute 'select * from artist'" I don't see the expected é in the name. However, the output of the Python code suggests all is fine.
Can someone have a look at this and tell me where I go wrong?
#! /usr/bin/env python
# -*- coding: utf-8 -*-
...
db = MySQLdb.connect(host='localhost', db='muziek', use_unicode=True, charset='utf8')
cursor = db.cursor()
...
# insert a sample record, name is a unicode string
name = 'Angélique Kidjo'.decode('utf-8')
print [c for c in name]
sortname = 'Kidjo, Angélique'.decode('utf-8')
cmd = """INSERT INTO artist (name, sortname) values ("%(name)s", "%(sortname)s")""" % locals()
# Notice that mysqldb translates the unicode string into a normal string.
cursor.execute(cmd)
...
cmd = '''SELECT * FROM artist WHERE name = "%(name)s"''' % locals()
cursor.execute(cmd)
recs = cursor.fetchall()
print recs[0]
The output of this script is:
[u'A', u'n', u'g', u'\xe9', u'l', u'i', u'q', u'u', u'e', u' ', u'K', u'i', u'd', u'j', u'o']
(1L, u'Ang\xe9lique Kidjo', u'Kidjo, Ang\xe9lique')
That seems to be OK. However, ...
Next, I run the following on the commandline:
$ mysql muziek --execute 'select * from artist'
+----+-----------------+------------------+
| id | name | sortname |
+----+-----------------+------------------+
| 1 | Ang�lique Kidjo | Kidjo, Ang�lique |
+----+-----------------+------------------+
The letter after "Ang" is 0xe9. And that is puzzling me. Is that the letter that is stored in the database? I think it is, but how can I be sure?
The table is created as follows:
CREATE TABLE artist
(
id SERIAL,
name VARCHAR(255) NOT NULL,
sortname VARCHAR(255) NOT NULL
) CHARACTER SET "utf8";
I know, I know, if I use mysql --default-character-set=utf8 then it prints out the extpected result. But why do I need that options? Why didn't mysql know in the first place that the fields were UTF8? I'm afraid that I'm covering an error with another error.
--
Kees