Re: Byte level storage seems right, display seems wrong
Posted by: Rick James
Date: March 05, 2016 01:10AM

When I first had the double-encoding problem, it took me months to figure it out, and more months to be able to say what you just said.
So, I understand why others have trouble grokking it.

To add to the confusion, PHP is coming one one way, CLI coming in another way; one looks right, the other doesn't, etc.

The bottom line is that there are about 4 places where latin1 or utf8 needs to be stated -- and they all need to match. You had a case where two things were wrong, yet "two wrongs made a right". It's perhaps the most insidious case.

Other cases are Mojibake (gibberish), truncation, string of question marks, and black diamond with question mark in it. All of those are visible, so it is easy to understand that something is wrong.

With 4 places to state the character set correctly or incorrectly, that leads to 16 combinations. But they boil down into the 5 types of errors.

I have been working on this problem for many years. My notes are in . Alas, it needs major polishing.

So much for my rambling... Back to your question...

Why are PHP and CLI sometimes inconsistent? I think it is actually the browser that sometimes steps in and says "Hmmm, even though you told me it is latin1, that looks like utf8, so I will be a nice guy and convert it for you." Until I realized that, I was tearing my hair. That led me to insist on HEX(col) for bypassing any confusing "nice guy" stuff.

I have a feeling I have not quite answered all your questions. (Sorry, I can't lecture to your users any better than you can.)

Options: ReplyQuote

Written By
Re: Byte level storage seems right, display seems wrong
March 05, 2016 01:10AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.