utf8 inputs store as html entities, how to retrievesubstr of utf8 inputs?
Posted by: masikwha
Date: February 08, 2009 08:49PM
I posted about this issue on this thread[http://forums.mysql.com/read.php?10,246670,246670#msg-246670] but I was suggested to go to XML forum for answer, but since this is mysql realted question I insisted on posting here again and hoping someone can give me some sugestion.
What i am looking for is a function similar to substr($str,1 100) to use in a query,hoping:> I could get certain length of UTF8 charecters.
My table contains utf8 charecters in a 'body' field(pressumabaly utf8), inserted via web form using unicode charecters. I need to pull only 200 charecters of the field 'body', using substr(body, 1, 200) wouldn't work because these are utf8 charecters, hence is there a way to do this, mb_substr(..) functions doesn't exist in mysql.
I am using mysql 5.1 version, my table 's charecter set is UTF8 collated utf_unicode_ci, Storage Engine: InnnoDB
When I look at the table, the inputs in body field which was inserted via web using unicode, are displayed in html entities. for example, a Devnagari unicode alphabet :
is stored as:
#2326 [begins with '#' and ends with ';' I can't type the whole of it here because it will be displayed as क when you see it]
that is 7 ascii charecters. This not only implies is taking lot of space in database table but gives me no option(since i dont' know how to) to pull certain number of charecter ie a sub string. Like example above, 7 charecters equal 1 single complete charecter in Devnagari unicode, so how do i go around to fetch , say only 100 devnagari complete charecters from a table, which would mean I would have to pull 100 * 7=700 charecters to equal to 100 complete Devnagari chars. But what is in the data field is dynamic ie the data can be mixed ascii& unicode so what is the way to go around this??
Hope someone can suggest me a way out of this.