MySQL Forums
Forum List  »  Full-Text Search

Re: Arabic text in mysql Varchar row
Posted by: Rick James
Date: September 10, 2011 03:07PM

C3993F20C398C2A7C3993FC398C2AEC398C2B720
Yuck! It is worse than just double-encoding.

First, let's look at the first part of that, split based on utf8, then converting parts back:
        C399 3F 20 C398 C2A7 C399 3F C398 C2AE C398 C2B7 20
        D9   3F 20 D8   A7   D9   3F D8   AE   D8   B7   20
Notes:  6....3. 4.                3.           5........ 4.
Notes:
1. C399/C398 -- came from D9/D8
2. Arabic characters, in utf8, begin with D9 or D8 (at least)
3. 3F is "?", which is often used when an illegal encoding is being converted.
4. 20 is a space -- No problem with this.
5. C398 C2B7 -- D8 B7 -- D8B7 is utf8 for "ARABIC LETTER TAH"; C398C2B7 is the 'double encoding' of that.
6. Some pair of bytes D9xx (I don't know what xx) failed in converting to C399yyyy. Instead of yyyy, you got 3F ('?'). Then coming back, the C399 (U-grave) went to D9, but '?' stayed '?'. That lead to an illegal utf8 code 'D93F', hence the diamond with the '?'.

I'm afraid that the data in your table is corrupted beyond recovery. Start over in inserting the data. And be sure to check the HEX as soon as you have some Arabic loaded. The cursory check of the hex: C398 and C399 are bad; D8 and D9 are good.

Options: ReplyQuote


Subject
Views
Written By
Posted
15861
September 05, 2011 04:23PM
3937
September 07, 2011 08:01PM
5214
September 08, 2011 04:45PM
3144
September 09, 2011 06:46PM
4877
September 11, 2011 08:03AM
2951
September 12, 2011 11:30AM
3576
September 12, 2011 06:16PM
2735
September 12, 2011 06:17PM
2903
September 13, 2011 09:44PM
3105
September 17, 2011 05:53AM
2683
September 18, 2011 01:45PM
2724
September 19, 2011 04:10PM
2842
September 20, 2011 08:19PM
3212
September 25, 2011 12:09PM
2762
September 28, 2011 09:53PM
2978
September 29, 2011 02:44PM
3699
September 30, 2011 10:14AM
Re: Arabic text in mysql Varchar row
3584
September 10, 2011 03:07PM
4076
September 11, 2011 08:01AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.