COUNT(*) has so many differences that one should not extrapolate from its EXPLAIN to other situations.
Partitioning by date is clumsier than it ought to be. Here are some snippets of PHP code that worked for me:
Building the table (by YEAR):
$parts = array();
foreach (range($yra, $yrz) as $yr)
{
$cap = $yr+1;
$parts[] = " PARTITION p$yr VALUES LESS THAN ($cap)";
}
$parts[] = " PARTITION p9999 VALUES LESS THAN MAXVALUE";
$part_str = implode(",\n", $parts);
$sql = "CREATE TABLE ... (
dt DATETIME NOT NULL,
...
)
PARTITION BY RANGE( YEAR(dt) ) (
$part_str
)";
Execute(...)
Note the use of naming the partitions by what will be in them, not the upper bound -- this helps me keep my sanity.
Adding a new partition seems to be tricky -- delete the last partition, then add the new one, plus a new last:
$sql = "ALTER TABLE PartYr DROP PARTITION p9999";
Execute(...)
$cap = $yr+1;
$sql = "ALTER TABLE PartYr ADD PARTITION (
PARTITION p$yr VALUES LESS THAN ($cap),
PARTITION p9999 VALUES LESS THAN MAXVALUE )";
Execute(...)
This looks simpler, but I haven't tested it yet:
ALTER TABLE PartYr
REORGANIZE PARTITION p9999 INTO (
PARTITION p2010 VALUES LESS THAN (2010),
PARTITION p9999 VALUES LESS THAN MAXVALUE );
-- http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html
I've done a few projects with Summary tables -- I find that there are not that many variations on a theme. Suggest you look through your slowlog to find out what the users are doing.
Also, I find that the 10x speedup by using a Summary table makes the pain worthwhile.
The thought process I use when tackling a new situation:
* Users don't want 10K rows of output; they want it summarized on something.
* Users often want to filter on a date range and summarize by day (or hour). Therefore, 'date' is probably the first part of the primary key. Decide what unit to use -- usually day or hour. Note that week (etc) can be derived from any smaller unit.
* The rest of the primary key should be a few, but not too many, other dimensions (or fields they might want to filter on). Consider having multiple indexes (each starting with date).
* Goal: A summary table should have one-tenth as many rows as the raw data. (Actual: anywhere from 2% to 70%.) That, together with probably having fewer columns, makes a summary table ~10x faster to report from.
* Denormalize. The raw data is probably heavily normalized (good); the summary table may be less normalized.
* COUNT(*) -- Users often want to know "how many". Sum the counts to get the count for a week from daily rows.
* SUM(*), but not AVG(*) -- use sum of sums divided by sum of counts to get the mathematically correct average.
* MAX and MIN are possible.
* Median, percentiles, etc, though "impossible", can be approximated (with pain).
* Standard deviation is possible (but I have never needed it).
* Have an API layer that takes the user's inputs and decides which Summary table to use, possibly even doing multiple SELECTs and/or UNIONs to achieve the goal.
* Expect to build 1-7 Summary tables.
* Consider turning small ENUMs into multiple columns; eg, SUM(gender='M') AS Males, SUM(gender='F') as Females.