MySQL :: Re: Partitioning with range(to

New Topic

Re: Partitioning with range(to_days) issue

Posted by: Rick James
Date: November 03, 2009 09:32AM

COUNT(*) has so many differences that one should not extrapolate from its EXPLAIN to other situations.

Partitioning by date is clumsier than it ought to be. Here are some snippets of PHP code that worked for me:

Building the table (by YEAR):

    $parts = array();
    foreach (range($yra, $yrz) as $yr)
    {
        $cap = $yr+1;
        $parts[] = "  PARTITION p$yr VALUES LESS THAN ($cap)";
    }
    $parts[] =     "  PARTITION p9999 VALUES LESS THAN MAXVALUE";
    $part_str = implode(",\n", $parts);
    $sql = "CREATE TABLE ... (
        dt DATETIME NOT NULL,
        ...
    )
    PARTITION BY RANGE( YEAR(dt) ) (
        $part_str
    )";
    Execute(...)

Note the use of naming the partitions by what will be in them, not the upper bound -- this helps me keep my sanity.

Adding a new partition seems to be tricky -- delete the last partition, then add the new one, plus a new last:

    $sql = "ALTER TABLE PartYr DROP PARTITION p9999";
    Execute(...)
    $cap = $yr+1;
    $sql = "ALTER TABLE PartYr ADD PARTITION (
                    PARTITION p$yr VALUES LESS THAN ($cap),
                    PARTITION p9999 VALUES LESS THAN MAXVALUE )";
    Execute(...)

This looks simpler, but I haven't tested it yet:

 ALTER TABLE PartYr
 REORGANIZE PARTITION p9999 INTO (
    PARTITION p2010 VALUES LESS THAN (2010),
    PARTITION p9999 VALUES LESS THAN MAXVALUE  );
  -- http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html

I've done a few projects with Summary tables -- I find that there are not that many variations on a theme. Suggest you look through your slowlog to find out what the users are doing.

Also, I find that the 10x speedup by using a Summary table makes the pain worthwhile.

The thought process I use when tackling a new situation:
* Users don't want 10K rows of output; they want it summarized on something.
* Users often want to filter on a date range and summarize by day (or hour). Therefore, 'date' is probably the first part of the primary key. Decide what unit to use -- usually day or hour. Note that week (etc) can be derived from any smaller unit.
* The rest of the primary key should be a few, but not too many, other dimensions (or fields they might want to filter on). Consider having multiple indexes (each starting with date).
* Goal: A summary table should have one-tenth as many rows as the raw data. (Actual: anywhere from 2% to 70%.) That, together with probably having fewer columns, makes a summary table ~10x faster to report from.
* Denormalize. The raw data is probably heavily normalized (good); the summary table may be less normalized.
* COUNT(*) -- Users often want to know "how many". Sum the counts to get the count for a week from daily rows.
* SUM(*), but not AVG(*) -- use sum of sums divided by sum of counts to get the mathematically correct average.
* MAX and MIN are possible.
* Median, percentiles, etc, though "impossible", can be approximated (with pain).
* Standard deviation is possible (but I have never needed it).
* Have an API layer that takes the user's inputs and decides which Summary table to use, possibly even doing multiple SELECTs and/or UNIONs to achieve the goal.
* Expect to build 1-7 Summary tables.
* Consider turning small ENUMs into multiple columns; eg, SUM(gender='M') AS Males, SUM(gender='F') as Females.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Partitioning with range(to_days) issue

11552

Jools Lebowski

October 29, 2009 11:13AM

Re: Partitioning with range(to_days) issue

3726

Rick James

October 31, 2009 12:00PM

Re: Partitioning with range(to_days) issue

3401

Jools Lebowski

November 02, 2009 04:07AM

Re: Partitioning with range(to_days) issue

3243

Rick James

November 02, 2009 10:22AM

Re: Partitioning with range(to_days) issue

3225

Jools Lebowski

November 03, 2009 02:18AM

Re: Partitioning with range(to_days) issue

3964

Rick James

November 03, 2009 09:32AM

Re: Partitioning with range(to_days) issue

2964

Jools Lebowski

January 13, 2010 05:17AM

Re: Partitioning with range(to_days) issue

2882

Rick James

January 22, 2010 11:09PM

Re: Partitioning with range(to_days) issue

3388

Jason Shoulet

November 05, 2009 03:12AM

Re: Partitioning with range(to_days) issue

3275

Rick James

November 05, 2009 08:37AM

Re: Partitioning with range(to_days) issue

4184

Jason Shoulet

November 05, 2009 09:51AM

Re: Partitioning with range(to_days) issue

3073

Mattias Jonsson

November 17, 2009 05:51AM

Re: Partitioning with range(to_days) issue

3252

Sarah Sproehnle

November 05, 2009 07:47PM

Re: Partitioning with range(to_days) issue

2933

Jools Lebowski

January 13, 2010 06:36AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.