MySQL Forums - Character Sets, Collation, Unicode

Collation and illegal mix (1 reply)

Nigel Gomm — Tue, 24 Sep 2024 13:56:01 +0000

i have a 8.0.37 mysql server running on azure. The server, the database, all tables and all columns are set to character set latin1 and collate latin1_general_ci.

when i create a view with unions from a user where the session variable collation_connection is utf8mb4_0900_ai_ci the view works as it should for every other user.

if i create the view from a user where that session variable is 'latin1_general_ci' every user gets the Illegal mix of collations for operation 'UNION' error message.

Is that a bug or do i have a fundamental misunderstanding of what's going on?

Collations break on MySQL Update (no replies)

Simon Anthony — Fri, 09 Aug 2024 09:44:52 +0000

We use a non-standard collation in order that customers can search for products using 14.4v or P-ABCDE. The period and hyphen in the voltage or part number would normally be treated by FullText as word breaks and by changing the collation this doesn't happen.

Our hosting partner keeps 'updating' MySQL and restoring the default XML files (Index.xml and latin1.xml) over our modified files and then the collation (1002) goes missing and the website fails.

My question is how do we stop this? Is the Oracle MySQL update overwriting the files (unlikely) or are the hosting company doing it? They of course say it's the update so I'd like to know if that's true before proceeding.

Having a website down for a day each month is getting seriously tedious.

Thanks.

Question marks instead of emoji when exporting from the mysql database to a .sql file (5 replies)

Микола Расік — Wed, 27 Mar 2024 08:31:37 +0000

I would like to export my emoji database, but I'm having a problem with the export. When I export my table to .sql some characters are replaced with "?". more precisely, the symbols of the flags. For example, such data (USA 🇺🇸 ☺) became such data after export (USA ?? ☺). I use mysql workbench. utf8mb4 is everywhere in the database. After exporting, I viewed the file through Notepad, there were signs "?" imported the same file into another database, the same "?" signs. Tell me, maybe someone knows how to be in such a situation)

I tried to put different encodings in the database, nothing works. looked through a bunch of different articles on this problem, could not solve anything, the problem remained

Not able to insert arbitrary binary data / invalid UTF8 characters into a VARCHAR column (1 reply)

Xiaoming Pan — Mon, 15 Jan 2024 09:40:06 +0000

We have a VARCHAR(255) column using collation utf8_unicode_ci in a table.

We can write arbitrary byte sequences (data that contains invalid UTF8 character sequences) using INSERT or UPDATE statements in MySQL 5.7.43. However, we get errors while performing the same actions with the same configurations (character set utf8mb3 collate utf8mb3_unicode_ci) in MySQL 8.0.33.

For example, I’ve tried the following

INSERT INTO data_tests (data) VALUES (0xED\xA0\xBC\xED\xB7\xA9\xED\xA0\xBC\xED\xB7\xAA);

In MySQL 5.7.43, the arbitrary byte sequences is written into the table successfully:

Query OK, 1 row affected (0.01 sec)

In MySQL 8.0.33, I get the following error:

ERROR 1366 (HY000): Incorrect string value: '\xED\xA0\xBC\xED\xB7\xA9...' for column 'data' at row 1

I also tried CONVERT( … USING UTF8) or BINARY( … ), but neither of them are working in MySQL 8.0.33.

How can I write an INSERT or UPDATE statement that bypasses the check/validation, allowing me to write arbitrary byte sequences in MySQL 8.0.33?

utf8mb4 introducer added to generated column expression (no replies)

Jason Brunette — Wed, 19 Oct 2022 15:44:59 +0000

Stock MySQL 8.0.31, Linux

CREATE TABLE `test` (
`id` int NOT NULL AUTO_INCREMENT,
`test_col` VARCHAR(45) GENERATED ALWAYS AS ("test value"),
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
SELECT GENERATION_EXPRESSION FROM information_schema.columns WHERE TABLE_NAME="test" AND COLUMN_NAME="test_col";

GENERATION_EXPRESSION = _utf8mb4\'test value\'

Why does test_col's GENERATION_EXPRESSION gain the _utf8mb4 introducer? Is there something I can do to prevent this? My app has a technical need for this to not happen. Plus, this seems unnecessary with everything in MySQL 8 defaulting to utf8mb4. This began when we migrated from MySQL 5.7 to MySQL 8.0.31.

Thanks.

The number of attributes is larger than the number of attribute values provided (500) (10 replies)

Dennis Lim — Wed, 03 Aug 2022 22:36:58 +0000

Hi All,

For the longest time I have been using Visual FoxPro with MySQL as my database backend. Also, I have been using SELECT COUNT(*) to get the total number of rows for any SELECT statement.

But this one is weird:

select count(*) as totrecs from pihdr a left join supplier b on a.supplierid=b.id left join ewtax c on a.ewtaxid=c.id left join jthdr d on a.jthdrid=d.id

In my ODBC trace log, this is what I found:

DIAG [01000] [MySQL][ODBC 8.0(w)Driver][mysqld-8.0.30]The number of attributes is larger than the number of attribute values provided (500)

The offending statement does not even show in the ODBC Trace log.

That is why I tried substituting it with (which is already deprecated):

select SQL_CALC_FOUND_ROWS * from pihdr a left join supplier b on
a.supplierid=b.id left join ewtax c on a.ewtaxid=c.id left join jthdr d
on a.jthdrid=d.id

select FOUND_ROWS() as totrecs

Still yields same error.

My code works with MySQL ODBC 8.0.30 and MySQL Server 5.7.37 perfectly... when I run it with MySQL ODBC 8.0.30 and MySQL Server 8.0.30.. that is where the error appear...

I was going to move up to MySQL 8.0.x. So I guess I have to wait for some resolution to this.

MySQL 8.0: Migrating to utf8mb4: Things to Consider (no replies)

Edwin Desouza — Tue, 29 Mar 2022 14:11:36 +0000

https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

MySQL 8.0: utf8mb4 (no replies)

Edwin Desouza — Thu, 13 Jan 2022 05:57:22 +0000

https://twitter.com/isotopp/status/1481383731562795009
"Why the #### did MySQL not upgrade the utf8 charset to 4 bytes, but created utf8mb4 instead?"
"Because indexes matter and they can be large, so changing collations and charsets is impossible."

Deep dive:
https://blog.koehntopp.info/2022/01/12/utf8mb4.html

https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

I can't insert arabic characters (1 reply)

Paola Sigurtà — Thu, 30 Dec 2021 21:56:39 +0000

I have to insert text in various languages and when I insert arabic it inserts this "????????" Help? Thank you

Update latin1/utf8 to utf8mb4 (1 reply)

Sivaranjani P — Mon, 20 Sep 2021 19:48:00 +0000

Hi Team,
Planning to update my database's from latin1/utf8 to utf8mb4 character set. Please clarify below things.
1. Will there be any dataloss during this conversion?
2. I have column with varchar(32) in utf8. But now maximum length of values inside this column is 31. while converting to utf8mb4 do I need to increase the value of varchar(?) ?

MySQL: Character Sets, Unicode, and UCA compliant collations (no replies)

Edwin Desouza — Thu, 22 Jul 2021 17:19:51 +0000

MySQL: Character Sets, Unicode, and UCA compliant collations
- https://blogs.oracle.com/mysql/mysql%3a-character-sets%2c-unicode%2c-and-uca-compliant-collations

Accent sensitive sorting by slovak alphabet (1 reply)

Filip Aufricht — Mon, 30 Aug 2021 20:08:24 +0000

I have a problem with collation of sorting results.
I would want to order results by slovak alphabet using utf8_slovak ci, but accent sensitive.
Utf8_slovak_ci works, but isn't accent sensitive as i understand.

It sorts results as this:
Sack, John
Sácká, Gabriela
Sačková, Eva
Sacková, Michelle
Sacks, Oliver
but I would like it to be like this:
Sack, John
Sacková, Michelle
Sacks, Oliver
Sačková, Eva
Sácká, Gabriela

How can I order results by slovak alphabet accent sensitively?
Thanks

Japanese voiced and unvoiced characters (5 replies)

Ken Guiche — Wed, 05 May 2021 07:24:38 +0000

MySQL doesn't seem to differentiate Japanese Dakuon (voiced) characters and Seion (unvoiced) characters. For instance SELECT query won't differentiate "きず (kizu = wound in English)" and "きす" (kisu = kiss in English), so when I run a query like "SELECT * FROM mytable WHERE mytable.pronunciation = 'きず', it returns the entries with both "きず = kizu" and "きす = kisu".
I've tried changing the character sets and collation of the table to all the combinations I can find, but it hasn't made any difference.

Am I doing something wrong, or is this the normal behavior of MySQL?

Thank you,

Case insensitive collation identical to Javascript (1 reply)

Rene Prillop — Thu, 29 Apr 2021 09:07:07 +0000

Can one please confirm or reject if
- SELECT (a = b) with utf8mb4_0900_as_ci collation
results always the same result as
- (a.toLowerCase() == b.toLowerCase()) in JavaScript?
a and b are both non-null Unicode strings.

If not, what would be the best collation (or solution) to make case-insensitive comparison to work the same on MySQL and JS?

Thanks.

behaviour of latin1 in mysql (3 replies)

madhur garg — Tue, 20 Apr 2021 01:15:08 +0000

I have some confusion on latin1 behaviour

As per my understanding latin1 supports 256 characters and use 1 byte per character.

I created 2 tables with latin1 and utf8 charset
I am using mysql 5.7

set names latin1;

CREATE TABLE `foo` (
`i` int(11) DEFAULT NULL,
`v` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into foo values ('Ũ');
insert into foo(v) values ('ϧ');
insert into foo(v) values ('þ');
mysql> select v, hex(v) from foo;
+------+--------+
| v | hex(v) |
+------+--------+
| Ũ | C5A8 |
| ϧ | CFA7 |
| þ | C3BE |
+------+--------+

--------------------------------------------

CREATE TABLE foo_utf8 ( `i` int(11) DEFAULT NULL, `v` varchar(10) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into foo_utf8(v) values ('þ');
insert into foo_utf8(v) values ('ϧ');
insert into foo_utf8(v) values ('Ũ');

mysql> select v, hex(v) from foo_utf8;
+------+----------+
| v | hex(v) |
+------+----------+
| þ | C383C2BE |
| ϧ | C38FC2A7 |
| Ũ | C385C2A8 |
+------+----------+

-----------------------------------

In both the cases client encoding was latin1

I am not able to understand
1. How come I am able to insert characters with unicode greater then 256 in latin table
2. how come I am able to fetch such data from latin1
3. If I am able to insert/ access all characters in Latin1, then why I need ant other encoding like utf8
4. why hex values are different for both the tables with different charset, although client encoding is same
5. why I am able to fetch correct data even though hex values are different and client encoding is same

please help me in understand or share any blogs so that I can go deeper to understand how charset and encoding works

thanks
Madhur

Issue with UTF-8 in MYSQL 5.7 (1 reply)

Shanu Kumar — Sun, 11 Apr 2021 16:30:18 +0000

Hi everyone,

Greetings. My name is Shanu. I am a developer in this company called Typeset (www.typeset.io).

We use MySQL 5.7 internally in our company, and I am constantly running into UTF-8 troubles. Would want to describe the instance.

On the frontend, we have real-time Editor on which the user can enter the data. Once the user enters this data, it should be stored properly in the database. However, here are the results I can see in the database:

(a.) Question marks (????) instead of Chinese/ Japanese characters.
(b.) Black diamonds coming up, such as Mi�uel.
(c.) Random text (eg. æ–°æµ for 新浪)

What should I do? I have done research on the internet, but not able to clearly understand. Any help in the right direction would be appreciated.

Unable to query with Chinese character (4 replies)

Lamarche Lam — Mon, 05 Jul 2021 19:11:07 +0000

Hello everyone,
I'm new to MySQL so I was practicing with a MySQL tutorial

There was a question in this tutorial requiring us to query a record using LIKE operator so I did it as:

select
Tname
from Teacher
where Tname like '李%';

and the result was Tname: (nothing showed up except the column name).
(All databases and tables in this tutorial was set to utf8 before any query was performed)

Some suggested me to set the encoding in the configuration file (my.cnf) into utf8 and so I add:

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
character-set-server=utf8

in the my.cnf file and apply it in the system preference panel.

However, it does't work at all.
Could somebody please show me a way out of this?
Thank you so much.

Blackhole engine and utf8 character set (1 reply)

Craig Healey — Fri, 23 Oct 2020 12:58:41 +0000

I'm replicating a 5.5 DB to 5.6. The default storage engine on 5.5 is InnoDB, and on 5.6 is Blackhole. The character set system variables are all the same. Creating the tables, all work except one.

create table t1 (
`uuid` varchar(255) NOT NULL,
`device_uuid` varchar(255) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`uuid`,`device_uuid`,`created_at`)
) DEFAULT CHARSET=utf8;

ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

I can create the table as InnoDB, and I know utf8 in MySQL isn't "actually" utf8. But I wondered if anyone knew the reason why InnoDB and Blackhole don't produce the same error?

MySQL: Some Character Set Basics (no replies)

Edwin Desouza — Mon, 17 Aug 2020 21:07:17 +0000

MySQL: Some Character Set Basics
https://blog.koehntopp.info/2020/08/18/mysql-character-sets.html

An in depth DBA's guide to migrating a MySQL database from the `utf8` to the `utf8mb4` charset (no replies)

Edwin Desouza — Mon, 20 Jul 2020 14:38:28 +0000

https://saveriomiroddi.github.io/An-in-depth-dbas-guide-to-migrating-a-mysql-database-from-the-utf8-to-the-utf8mb4-charset/

Changing collation type slows down queries (no replies)

Doug Barger — Thu, 23 Apr 2020 18:10:29 +0000

I don't understand the issue with this. I have some tables that come to me as
latin1_swedish_ci i need to do case sensitive joins so i change all the tables involved in both databases to utf8mb4_cs_0900_as_cs now my queries are very slow ! the 'explain' shows that the indexes are being used. What can be the issue? i am using community version on Microsoft desktop.

Config or Installation Issue (10 replies)

Luis Benitez-Martell — Tue, 21 Apr 2020 05:39:17 +0000

Data contained in a varchar type field, produces an error that avoid saving a record because de data contains characters (scape sequences) that prints Spanish language accents since the fields of any table shall contain information in Spanish and some English words or short expression(s) in the same piece of data. I think it it a config issue, because the same application, but using MSSQL as data base engine, accepted the string data as it came from the comma delimited source as is in the sequential (text) file. This is the SQL instruction and error message:

Insert INTO prodfarm (Codigo_Prod, Nombre, Prin_Activo, Forma_Farma, Activ_Terap, No_Reg_SSA, Fecha_Revision, Presentaciones, Caducidades ) Values (
'REG-0135', 'Evastel D', 'Ebastina 10 mg / Pseudoefedrina 120 mg', 'Cápsulas liberación prolongada', 'Anti-histamínico, Descongestivo', '622M98 SSA II', '2005-10-03', 'Caja con 10 y 5 cápsulas en envase de burbuja', '36 MESES' );

Error Code: 1366. Incorrect string value: '\xC3\xBApsul...' for column 'Presentaciones' at row 1

The weird thing is that other fields as Activ_Terap accepts accented words(histamínico) and allows the record to be saved, but the field Presentaciones does not.
Can anyone help me to solve this issue? I will appreciate a lot!

As a work around, I replaced the accented vowel character with a non accented one, i.e. cápsula is replaced with capsula. The record could be stored but it's a miss spelling.
Thank you.

UTF-8 Everywhere (no replies)

Edwin Desouza — Tue, 14 Apr 2020 16:16:49 +0000

About the authors

This manifesto was written by Pavel Radzivilovsky, Yakov Galka and Slava Novgorodov. It is a result of our experience and research of real-world Unicode issues and mistakes done by real-world programmers. Our goal here is to improve awareness of text issues and to inspire industry-wide changes to make Unicode-aware programming easier, ultimately improving the experience of users of those programs written by human engineers. Neither of us is involved in the Unicode consortium.

Special thanks to Glenn Linderman for providing information about Python, and to Markus Künne, Jelle Geerts, Lazy Rui and Jan Rüegg for reporting bugs and typos in this document.

Much of the text was inspired by discussions on StackOverflow initiated by Artyom Beilis, the author of Boost.Locale. Additional inspiration came from the development conventions at VisionMap and Michael Hartl’s tauday.org.

UTF-8 Everywhere
- http://utf8everywhere.org/
- https://news.ycombinator.com/item?id=22867503

TicketSolve: Upgrading from MySQL 5.7 to 8.0 (Character Sets and Collations) (no replies)

Edwin Desouza — Sun, 23 Feb 2020 21:45:41 +0000

Upgrade and UTF8:
- https://saveriomiroddi.github.io/An-in-depth-dbas-guide-to-migrating-a-mysql-database-from-the-utf8-to-the-utf8mb4-charset/
- https://saveriomiroddi.github.io/Pre-fosdem-talk-upgrading-from-mysql-5.7-to-8.0/

Adding custom french EBCDIC collation for unicode (no replies)

Sergey Vovnenko — Wed, 22 Jan 2020 15:03:09 +0000

Hi everyone,

I have a task to add custom EBCDIC collation with french symbols.
This collation based on https://en.wikipedia.org/wiki/EBCDIC_1047

1)
I've already tried to add custom collation to latin1

add weights map to latin1.xml

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 3A 5F 5B 3B 4C 30 5D 2D 3D 3C 2E 4B 40 2B 41
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 5A 3E 2C 5E 4E 4F
5C C1 C2 C3 C4 C5 C6 C7 C8 C9 D1 D2 D3 D4 D5 D6
D7 D8 D9 E2 E3 E4 E5 E6 E7 E8 E9 AD E0 BD 3F 4D
59 61 62 63 64 65 66 67 68 69 71 72 73 74 75 76
77 78 79 A2 A3 A4 A5 A6 A7 A8 A9 C0 2F D0 A1 FF
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
21 AA 2A B1 7F B2 4A B5 BB B4 7A 6A B0 CA AF BC
70 6F EA FA BE A0 B6 B3 7D DA 7B 6B B7 B8 B9 AB
44 44 42 46 43 47 7E 48 54 51 52 53 58 55 56 57
AC 49 ED EE EB EF EC BF 60 FD FE FB FC BA AE 39
24 25 22 26 23 27 7C 28 34 31 32 33 38 35 36 37
6C 29 CD CE CB CF CC E1 50 DD DE DB DC 6D 6E DF

In this approach everything works fine except unicode symbols.
Using this collation they convert to "?" have same weight.
Unicode symbols during sorting put into middle of the list (after "?") instead of end of it.

2)
Also have tried solution (Defining a UCA Collation Using LDML Syntax) described here https://dev.mysql.com/doc/refman/5.7/en/ldml-collation-example.html
Add rules to utf8 charset.
But it seems that dont support order that I set. It just put symbols in the beginning but not support order that I mentioned in block:

\u0000
\u0020
\u00A0
\u00E2
\u00E4
\u00E0
\u00E1
\u00E3
\u00E5
...

Maybe someone can give me a hint how to add custom collation based on EBCDIC 1047 with support of unicode symbols (symbols other that EBCDIC 1047 must be put into end with maximum weight).

Thank you in advance!

MySQL 8.0: A Tale of UDFs with Character Sets (no replies)

Edwin Desouza — Fri, 17 Jan 2020 23:43:08 +0000

MySQL 8.0: A Tale of UDFs with Character Sets
https://mysqlserverteam.com/a-tale-of-udfs-with-character-sets/

MYSQL DB not showing uft8 on php page (8 replies)

Ben Je — Thu, 16 Jan 2020 06:22:28 +0000

Hello everyone,
I created a database and a table. I created a php page. When I visit the php page, the foreign characters change to question marks. This is what I did to troubleshoot:

1- php page contains the tag:
2- I tested that the problem is with the mysql db by adding text directly on the php page, and it was showing correctly. So it's not the php page not uft enabled.
3- mysql db is using MyISAM utf8_general_ci
4- When I go to the table inside the db, I see the text displayed correctly.
5- When I test using the my localhost, everything works perfectly. When I add it to Bluehost server, I get the ???? instead of the actual characters.

What am I missing? Thanks!

Summary of trailing spaces handling in MySQL, with version 8.0 upgrade considerations (no replies)

Edwin Desouza — Mon, 02 Dec 2019 00:31:04 +0000

https://saveriomiroddi.github.io/Summary-of-trailing-spaces-handling-in-MySQL-with-version-8.0-upgrade-considerations/

An in depth DBA's guide to migrating a MySQL database from the `utf8` to the `utf8mb4` charset (no replies)

Edwin Desouza — Mon, 02 Dec 2019 00:29:20 +0000

https://saveriomiroddi.github.io/An-in-depth-dbas-guide-to-migrating-a-mysql-database-from-the-utf8-to-the-utf8mb4-charset/

Case insensitive search in utf8 db (14 replies)

John Stergiou — Mon, 02 Dec 2019 20:47:32 +0000

I have a db with uft8 values in greek like:
ΓΙΑΝΝΗΣ
Γιάννης
Γιαννης
(it is the word John in greek) and I want to write a query to be able to find all these 3 instances. In other words, I have words in upper case, words in lower case, words with or without tonos (= the mark ' on letter α, second row in my example). Also, another word may be before or after that word. I tried adding " COLLATE NOCASE" at the end of the query and didn't help. Tried: SELECT * FROM Table WHERE UPPER(item) LIKE 'UPPER(%text%')
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a greek word ends with "σ", we use "ς" instead). What can I do to deal with it?