I made a post earlier today expressing my frustration at 'a' matching a-acute and similar comparison results. I'd wanted to gain a deeper understanding of this problem, so I took a look at the server code that implements the UCA.
It looks like an important source file is ctype-uca.c. This contains 256 tables containing 16 bit numbers that are grouped into sets of 3, 4, or 5 depending on the table. Many of the sets only contain one non-zero entry.
Comparing this to allkeys.txt at http://www.unicode.org/Public/UCA/latest/allkeys.txt
it looks like there is a lot of data that has been zeroed out. Is this what the comment at the top of the source file "Only Primary level key comparison" means?
For my situation, with diacritical marks, it looks like allkeys.txt has two collation entries, one identical to that of the unadorned character, and the other identical to that of the combining character. For example,
0061 ; [.0E33.0020.0002.0061] # LATIN SMALL LETTER A
00E1 ; [.0E33.0020.0002.0061][.0000.0032.0002.0301] # LATIN SMALL LETTER A WITH ACUTE; QQCM
There appears to be no place for the second entry in the tables in the source file. Is this what the comment "No combining marks processing is done" means?
One option for my company would be to try to contribute to the mysql source code and in this way provide a solution to our problem.