MySQL Forums
Forum List  »  Newbie

Re: Searching database for strings with Swedish characters
Posted by: Rick James
Date: December 14, 2013 12:23PM

Good analysis, however...

"a1" is not hex for "a" in either latin1 or utf8.
"a1", in utf8, is ¡ (¡). "61" is 'a'.

How was "ä" keyed into the Mac?

Is the collation utf8_unicode_ci?

Still, your analysis leads to a likely solution...

CREATE TABLE `CombiningDiacritical` (
  `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `general` varchar(9) NOT NULL,
  `unicode` varchar(9) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

INSERT INTO CombiningDiacritical
(general, unicode)
VALUES
(UNHEX('c3a4'), UNHEX('c3a4')),
(UNHEX('61cc88'), UNHEX('61cc88'));

SELECT id,
    general = UNHEX('c3a4'),
    unicode = UNHEX('c3a4'),
    general = UNHEX('61cc88'),
    unicode = UNHEX('61cc88')
FROM CombiningDiacritical\G
*************************** 1. row ***************************
                       id: 1
  general = UNHEX('c3a4'): 1
  unicode = UNHEX('c3a4'): 1
general = UNHEX('61cc88'): 0
unicode = UNHEX('61cc88'): 1
*************************** 2. row ***************************
                       id: 2
  general = UNHEX('c3a4'): 0
  unicode = UNHEX('c3a4'): 1
general = UNHEX('61cc88'): 1
unicode = UNHEX('61cc88'): 1

Notice how utf8_general_ci fails ("0") to match the 'other' encoding for "ä".

Perhaps the solution is as simple as
ALTER TABLE ... MODIFY COLUMN ... COLLATE utf8_unicode_ci ...;

Thanks for giving me something to add to http://mysql.rjweb.org/doc.php/charcoll

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.