Re: MyISAM using 'wrong' index
Let's analyze this:
> The actual select takes 4.4 sec to return 894 rows.
> KEY `idx_viewability_io_date` (`vie_InsertionOrderId`,`vie_Date`,`vie_Provider`)
> SELECT distinct vie_InsertionOrderId FROM `insertion_order_domain_size_viewability`;
> '1047', 'Using index for group-by'
> contains 8.6 million records
You have a large table, which is probably not well cached. There is an index with 8.6M rows in it, so the index is perhaps 200MB. The optimizer picked 'Using index for group-by' and leapfrogged through the 200MB to discover 894 changes in `vie_InsertionOrderId`. Assuming that index was not cached, that could be more or less 894 disk hits. On ordinary spinning drives, that could take 9 seconds. So, 4.4s seems pretty good.
I suspect that it you ran the same query a second time, it would run in less than a second.
You are using MyISAM (not the preferred Engine). key_buffer_size controls the amount of RAM set aside for caching MyISAM indexes, and where the 894 disk blocks might be cached. That setting should be about 20% of available RAM. I see your key_buffer is 16GB; do you have 80GB of RAM?
Now let's analyze
> WHERE vie_InsertionOrderId = 198352 AND vie_Date BETWEEN '2015-06-23' AND '2015-06-29';
> takes 0.562sec to complete.
That _should_ use the same index. It is using the first two fields this time. And it is an excellent index for the query. It drills down the BTree for 198352-'2015-06-23', then scans forward through '2015-06-29'.
However, since it estimated that where were only '2' rows in that date range (perhaps incorrectly), it decided to use the index starting with vie_date (the PRIMARY KEY). As you say, the number is actually 151,188.
Sometimes (rarely) the statistics get out of whack and need to be kicked. You can use ANALYZE TABLE to recalculate the stats and _perhaps_ get better explain plans. However, it will scan the entire table; this may be too slow for you to want to do. (But I guess you know how long it takes.) I have never see a case where ANALYZE (or OPTIMIZE) is worth running more than once a month; certainly not daily.
OPTIMIZE rebuilds the table and ANALYZEs. The rebuilding will squeeze out any fragmented space.
Since you use no fields in the other table, the `LEFT JOIN ref_domain` is useless. But the optimizer does the lookups anyway.
Sorry, I guess I did not really answer your question.