MySQL Forums
Forum List  »  Connector/Python

MySQLdb crashes when given unicode(?) data
Posted by: Brad Smith
Date: October 24, 2006 07:18AM

Hi folks,

I am the maintainer of Fedora Tracker (www.fedoratracker.org), a search engine of package repositories for the Fedora Project distribution of Linux. I've run into a problem that has me stuck, which which I hope someone here can help
with.

Basically, the back-end component of the tracker reads xml files that describe each package in a repository and then stores the information for each package in a mysql db. Recently, though, an RPM showed up in one of the repositories that seems to have some unicode escapes (trademark and copyright symbols, I think-- unicode is really not something I've dealt with a lot) that are causing the MySQLdb module to crash and I can't figure out how to get python to translate them into something inoffensive. I tried adding:

if type(q) == types.UnicodeType:
q = q.encode("utf-8","replace")

to deal with it, but it didn't work. And anyway, it was really just a
blind guess-- Like I said, I'll be the first to admit that character encoding stuff is not my strong suit.

Anyway, here is the complete query being executed (note the "Intel\xc2\xae" in
the description field):

"INSERT INTO package_fedora_5 SET `name` = 'ipw2100-kmdl-2.6.17-1.2174_FC5',
`version` = '1.2.0', `release` = '41.rhfc5.at', `url` =
'http://ipw2100.sourceforge.net/';, `dlurl` =
'http://dl.atrpms.net/fc5-x86_64/atrpms/stable/ipw2100-kmdl-2.6.17-1.2174_FC5-1.2.0-41.rhfc5.at.x86_64.rpm';,
`description` = 'This package contains kernel drivers for the Intel\xc2\xae
PRO/Wireless 2100.\n\n\nThis package contains the ipw2100-kmdl-2.6.17-1.2174_FC5
kernel modules for the Linux kernel
package:\nkernel-2.6.17-1.2174_FC5.x86_64.rpm.', `rpmgroup` = 'System
Environment/Kernel', `vendor` = 'ATrpms.net', `packager` = 'ATrpms
http://ATrpms.net/';, `prein` = 'NULL', `postin` = 'NULL', `preun` = 'NULL',
`postun` = 'NULL', `arch` = 'x86_64', `checksum` =
'sha:bf3ba4e450021eac031a6e3412980d051ff059f6', `changelog` = 'NULL', `fileList`
= '', `package_id` = NULL, `repo_id` = 5, `epoch` = 0, `numfiles` = 0"


And here is the resulting crash (note: "0xc2"):


Traceback (most recent call last):
File "./tracker-process.py", line 110, in ?
db.updateRepo(r)
File "/home/brads/www/trackerBE.py", line 759, in updateRepo
ret = self.storeRpmInfo(pkg,storeMe.repo_id,storeMe.version,storeMe.url)
File "/home/brads/www/trackerBE.py", line 852, in storeRpmInfo
self.execute(query)
File "/home/brads/www/trackerBE.py", line 304, in execute
res = self.cursor.execute(q)
File "/home/brads/pymods/MySQLdb/cursors.py", line 146, in execute
query = query.encode(charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 353:
ordinal not in range(128)


This is currently breaking repo processing, which means that if it's not dealt
with in the next couple of days it will affect Fedora Tracker's ability to keep
up with the FC6 release, so any help would be greatly appreciated.

The relevant code is here if anyone wants to see it in context:

http://fedoratracker.cvs.sourceforge.net/fedoratracker/fedoratracker/trackerBE.py?revision=1.50&view=markup

Thanks!
--Brad

Options: ReplyQuote


Subject
Written By
Posted
MySQLdb crashes when given unicode(?) data
October 24, 2006 07:18AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.