MySQL Forums
Forum List  »  Partitioning

Re: Partitioning with range(to_days) issue
Posted by: Rick James
Date: November 03, 2009 09:32AM

COUNT(*) has so many differences that one should not extrapolate from its EXPLAIN to other situations.

Partitioning by date is clumsier than it ought to be. Here are some snippets of PHP code that worked for me:

Building the table (by YEAR):
    $parts = array();
    foreach (range($yra, $yrz) as $yr)
    {
        $cap = $yr+1;
        $parts[] = "  PARTITION p$yr VALUES LESS THAN ($cap)";
    }
    $parts[] =     "  PARTITION p9999 VALUES LESS THAN MAXVALUE";
    $part_str = implode(",\n", $parts);
    $sql = "CREATE TABLE ... (
        dt DATETIME NOT NULL,
        ...
    )
    PARTITION BY RANGE( YEAR(dt) ) (
        $part_str
    )";
    Execute(...)
Note the use of naming the partitions by what will be in them, not the upper bound -- this helps me keep my sanity.

Adding a new partition seems to be tricky -- delete the last partition, then add the new one, plus a new last:
    $sql = "ALTER TABLE PartYr DROP PARTITION p9999";
    Execute(...)
    $cap = $yr+1;
    $sql = "ALTER TABLE PartYr ADD PARTITION (
                    PARTITION p$yr VALUES LESS THAN ($cap),
                    PARTITION p9999 VALUES LESS THAN MAXVALUE )";
    Execute(...)
This looks simpler, but I haven't tested it yet:
 ALTER TABLE PartYr
 REORGANIZE PARTITION p9999 INTO (
    PARTITION p2010 VALUES LESS THAN (2010),
    PARTITION p9999 VALUES LESS THAN MAXVALUE  );
  -- http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html

I've done a few projects with Summary tables -- I find that there are not that many variations on a theme. Suggest you look through your slowlog to find out what the users are doing.

Also, I find that the 10x speedup by using a Summary table makes the pain worthwhile.

The thought process I use when tackling a new situation:
* Users don't want 10K rows of output; they want it summarized on something.
* Users often want to filter on a date range and summarize by day (or hour). Therefore, 'date' is probably the first part of the primary key. Decide what unit to use -- usually day or hour. Note that week (etc) can be derived from any smaller unit.
* The rest of the primary key should be a few, but not too many, other dimensions (or fields they might want to filter on). Consider having multiple indexes (each starting with date).
* Goal: A summary table should have one-tenth as many rows as the raw data. (Actual: anywhere from 2% to 70%.) That, together with probably having fewer columns, makes a summary table ~10x faster to report from.
* Denormalize. The raw data is probably heavily normalized (good); the summary table may be less normalized.
* COUNT(*) -- Users often want to know "how many". Sum the counts to get the count for a week from daily rows.
* SUM(*), but not AVG(*) -- use sum of sums divided by sum of counts to get the mathematically correct average.
* MAX and MIN are possible.
* Median, percentiles, etc, though "impossible", can be approximated (with pain).
* Standard deviation is possible (but I have never needed it).
* Have an API layer that takes the user's inputs and decides which Summary table to use, possibly even doing multiple SELECTs and/or UNIONs to achieve the goal.
* Expect to build 1-7 Summary tables.
* Consider turning small ENUMs into multiple columns; eg, SUM(gender='M') AS Males, SUM(gender='F') as Females.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Partitioning with range(to_days) issue
3789
November 03, 2009 09:32AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.