utf8 inputs store as html entities, how to retrievesubstr of utf8 inputs?
Posted by: masikwha
Date: February 08, 2009 08:49PM

I posted about this issue on this thread[http://forums.mysql.com/read.php?10,246670,246670#msg-246670] but I was suggested to go to XML forum for answer, but since this is mysql realted question I insisted on posting here again and hoping someone can give me some sugestion.



What i am looking for is a function similar to substr($str,1 100) to use in a query,hoping:> I could get certain length of UTF8 charecters.

My table contains utf8 charecters in a 'body' field(pressumabaly utf8), inserted via web form using unicode charecters. I need to pull only 200 charecters of the field 'body', using substr(body, 1, 200) wouldn't work because these are utf8 charecters, hence is there a way to do this, mb_substr(..) functions doesn't exist in mysql.

I am using mysql 5.1 version, my table 's charecter set is UTF8 collated utf_unicode_ci, Storage Engine: InnnoDB

When I look at the table, the inputs in body field which was inserted via web using unicode, are displayed in html entities. for example, a Devnagari unicode alphabet :


is stored as:
#2326 [begins with '#' and ends with ';' I can't type the whole of it here because it will be displayed as क when you see it]

that is 7 ascii charecters. This not only implies is taking lot of space in database table but gives me no option(since i dont' know how to) to pull certain number of charecter ie a sub string. Like example above, 7 charecters equal 1 single complete charecter in Devnagari unicode, so how do i go around to fetch , say only 100 devnagari complete charecters from a table, which would mean I would have to pull 100 * 7=700 charecters to equal to 100 complete Devnagari chars. But what is in the data field is dynamic ie the data can be mixed ascii& unicode so what is the way to go around this??


Hope someone can suggest me a way out of this.
Thanks

Options: ReplyQuote


Subject
Views
Written By
Posted
utf8 inputs store as html entities, how to retrievesubstr of utf8 inputs?
7618
February 08, 2009 08:49PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.