Alexander,
Can you point me to a SQL document which supports your claim that "any operation
which involves comparison must honor the argument's collations" ?
While your suggestion to use "collate utf8_bin" works, it is not intuitive.
It means that if accented characters are allowed to appear in the database
and there are uniqueness requirements on the character data,
then the binary collation must be used for the database/table/column creation,
and the applications which query the database are then responsible for
specifying the collation to be used for comparisons in all select statements.
I thought I'd point out that Oracle handles this differently.
(I'm not claiming that Oracle has the better solution, but it is probably
a good measure of the expected behaviour.) Oracle uses the "alter session" statement
to establish values for nls_sort and nls_comp which are used for specifying the collation
sequence for sorts and comparisons. From the Oracle 10g manual (e.g.,
http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch3globenv.htm):
---
NLS_SORT specifies the type of sort for character data. It overrides the
default value that is derived from NLS_LANGUAGE.
The value of NLS_SORT affects the following SQL operations: WHERE, START WITH,
IN/OUT, BETWEEN, CASE WHEN, HAVING, ORDER BY. All other SQL operators make
comparisons in binary mode only.
The value of NLS_COMP affects the following SQL operations: WHERE, START WITH,
IN/OUT, BETWEEN, CASE WHEN, HAVING, ORDER BY. All other SQL operators make
comparisons in binary mode only.
You can use NLS_COMP to avoid the cumbersome process of using the NLSSORT
function in SQL statements when you want to perform a linguistic comparison
instead of a binary comparison. When NLS_COMP is set to ANSI, SQL operations
perform a linguistic comparison based on the value of NLS_SORT.
---
In analogy to the Oracle behaviour, I expected to be able to create a database/table/column
with the desired collation which would be used for the listed operations, but that all other
SQL operators would make comparisons in binary mode only.