MySQL Forums
Forum List  »  Connector/Python

Confused about UTF8
Posted by: Kees Bakker
Date: March 27, 2008 04:19AM

Hi,

For a simple music database (artist, album, track) I want to enter UTF8 strings. But I can't get it to function properly. That is, I don't see what I expect.

I wrote a test script in Python as follows (just the important details). When I run that script it inserts an example record in the table. And when I look at the result with the command "mysql --execute 'select * from artist'" I don't see the expected é in the name. However, the output of the Python code suggests all is fine.

Can someone have a look at this and tell me where I go wrong?

#! /usr/bin/env python
# -*- coding: utf-8 -*-
...
db = MySQLdb.connect(host='localhost', db='muziek', use_unicode=True, charset='utf8')
cursor = db.cursor()
...
# insert a sample record, name is a unicode string
name = 'Angélique Kidjo'.decode('utf-8')
print [c for c in name]
sortname = 'Kidjo, Angélique'.decode('utf-8')
cmd = """INSERT INTO artist (name, sortname) values ("%(name)s", "%(sortname)s")""" % locals()
# Notice that mysqldb translates the unicode string into a normal string.
cursor.execute(cmd)
...
cmd = '''SELECT * FROM artist WHERE name = "%(name)s"''' % locals()
cursor.execute(cmd)
recs = cursor.fetchall()
print recs[0]

The output of this script is:
[u'A', u'n', u'g', u'\xe9', u'l', u'i', u'q', u'u', u'e', u' ', u'K', u'i', u'd', u'j', u'o']
(1L, u'Ang\xe9lique Kidjo', u'Kidjo, Ang\xe9lique')
That seems to be OK. However, ...

Next, I run the following on the commandline:
$ mysql muziek --execute 'select * from artist'
+----+-----------------+------------------+
| id | name | sortname |
+----+-----------------+------------------+
| 1 | Ang�lique Kidjo | Kidjo, Ang�lique |
+----+-----------------+------------------+

The letter after "Ang" is 0xe9. And that is puzzling me. Is that the letter that is stored in the database? I think it is, but how can I be sure?
The table is created as follows:
CREATE TABLE artist
(
id SERIAL,
name VARCHAR(255) NOT NULL,
sortname VARCHAR(255) NOT NULL
) CHARACTER SET "utf8";

I know, I know, if I use mysql --default-character-set=utf8 then it prints out the extpected result. But why do I need that options? Why didn't mysql know in the first place that the fields were UTF8? I'm afraid that I'm covering an error with another error.
--
Kees

Options: ReplyQuote


Subject
Written By
Posted
Confused about UTF8
March 27, 2008 04:19AM
March 27, 2008 08:16AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.