Re: Execution plan: wrong index selected, very poor performance.
> the magic number is somewhere between 8 and 9M rows.
That's out of 56947209 rows.
That's about 15%.
That's to be expected.
Here's what is going on...
Case 1: A few rows will be selected by the WHERE clause. This is an excellent use of an INDEX.
Case 2: Most or all of the table matches the WHERE clause. Now, if it uses the index, it needs to scan through most or all of the index, reaching over into the data for each matching row. Back and forth, back and forth. A lot of work. This is a terrible use of the INDEX; a "table scan" is much faster.
Case 3. What to do if 15% of the rows match? Well, somewhere in the middle, the optimizer draws a line between case 1 and case 2. There is no reasonable way for it to know exactly where optimal dividing line is, so it makes a guess. Typically it is somewhere between 10% and 30%, but it seems to depend on the phase of the moon.
> Is it possible to influence optimizer when to prefer full table scan vs. index?
Yes, but it usually hurts performance. See FORCE INDEX.
> Compound index worked, took ages to create it though
5.8GB of data and 2.5GB of indexes were rebuilt -- Disk I/O takes time.
> and increased index size by over 1 gig.
Seems reasonable for an index into 56M rows.
Still not happy with the speed? "Count the disk hits." MySQL needs to read a lot of stuff to satisfy your query. What I did with the new index is minimize how much it needs to read. But that is still a lot.
The real trick to speeding it up is "summary tables". This involves incrementally adding to a table that contains subtotals by, say, the hour. Lots of links here:
http://mysql.rjweb.org/bestof.html . 10x to 1000x speedup may be possible.