Adding collation to utf8mb4 charset
Posted by: Knut Edelbert
Date: April 26, 2018 05:33AM

If you want to add a custom collation in mysql, for utf-8 charsets you can modify .../charsets/Index.xml and extend the charset with the LDML-Syntax:

<charset name="utf8">
  ...
  <collation name="utf8_myown_ci" id="1234">
    <rules>
      <reset>\u0000</reset>
        <i>\u0020</i> <!-- space -->
        ...
    </rules>
  </collation>
  ...
</charset>

But there is not charset-tag with name "utf8mb4". So I created one with name="utf8mb4" and added collation/rules tags and in phpmyadmin i could choose the newly created collation. But i couldn't inserts four byte characters; i get the error

"#1366 - Incorrect string value: '\xF0\x9F\x8D\xB5\xF0\x9F...' for field ..."

To be more precise: I have one column (a) with the bulit-in collation utf8mb4_general_ci and one column (b) with my own collation utf8mb4_myown_ci(defined in Index.xml). I insert the same data in both columns and in column a there is no error and in column b i'll get the error as described above.

I created the following entry in Index.xml:

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 MB4 Unicode</description>
  <collation name="utf8mb4_general_ci" id="45">
    <flag>primary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="utf8mb4_bin"     id="46">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="utf8mb4_myown_ci"  id="213">
  </collation>
</charset>

It seems to be no problem to have the collation-tag empty, because i created an empty utf8_myown_ci inside charset="utf-8" and this works.

In the column with utf8mb4_myown_ci i can also insert 3 Byte Chars, so it seems it is interpreted as an utf8 collation.

I tried google multiple times and didn't find anything here, but i couldn't find any hints, how to add collations to charsets, which aren't present in Index.xml.

Any Ideas how to do it? Thank you for any hints!

Options: ReplyQuote


Subject
Views
Written By
Posted
Adding collation to utf8mb4 charset
3902
April 26, 2018 05:33AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.