Hi Rick,
The optimization of IN-expression that I am talking about is the semijoin optimzations introduced MySQL 5.6. See:
Manual:
http://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html
Blog:
http://roylyseng.blogspot.co.uk/2012/04/semi-join-in-mysql-56.html
The basic idea of semi-join is to convert the IN-expression into an inner join with some additional logic to remove duplicates. (Duplicates will appear when there are multiple matches in the result set of a subquery for a row of the outer query.) The main advantage of the conversion to join is that the optimizer will then be able to process the tables in any order; i.e., it is not restricted to process outer tables before the inner tables.
There are several strategies for removing duplicates. One of them, MaterializationLookup, will do as you describe and create indexed temporary tables for the query. Another one, LooseScan, will use an existing index and skip subsequent identical values to avoid duplicates.
For the queries where the result set of the subquery is much larger than the number of candidate rows from the outer query, the FirstMatch strategy will probably be used. AFAIU, this is the case with the query discussed in this thread. The FirstMatch is basically the same strategy as used pre-5.6. Hence, I do not expect performance to improve for this query if IN is used instead of EXISTS.
Øystein Grøvlen,
Senior Principal Software Engineer,
MySQL Group, Oracle,
Trondheim, Norway