MySQL Forums
Forum List  »  Partitioning

Re: Huge Dataset (perhaps?) made for Range Partitioning (Long)
Posted by: Mikael Ronström
Date: September 22, 2005 01:54PM

Hi,
It certainly sounds like an application suitable for partitioning.
I presume you could do something like this

CREATE TABLE LARGE_TABLE (....)
PARTITION BY LIST(DAYOFYEAR(C)+(YEAR(C)-2005)*366)
(PARTITION p0 VALUES IN (121));

Assuming that you want to start with the 121st day of the year.
Then each new day you add a new partition with
ALTER TABLE LARGE_TABLE ADD PARTITION (px VALUES IN (121+day_number));

Naturally range partitioning could be used as well and one can choose whether one wants
daily partitions, weekly partitions or any other distance between them.

The current 5.1 version has this syntax but the optimised version of this is still a few weeks
away so don't try it on the really large tables yet.

From looking at the table I get a few questions:

1) How important is the primary key. To be able to ensure primary key constraints it is
necessary that the field used in the partition function is part of the primary key. Otherwise
one essentially needs to check in each partition for uniqueness which gets a bit buurdensome
when you have hundreds of partitions.

So could the primary key be (A, C) instead assuming that C is used for the partition function.

Rgrds Mikael

JinNo Kim wrote:
> We were given storage space on a NAS solution
> accessed via NFS, and decided to pursue a
> challenge that we'd been toying with for a long
> time - centralizing the storage of millions of
> transactions a day (10-30Million seems to be
> pretty average after 90 days of operation - low of
> 6.9M high of 60M). We prepared our remote
> applications to output hourly batched insert
> statements, compress, retrieve and insert into a
> new database at our datacenter. By thinking
> through the database design pretty carefully, we
> end up with only one table that needs to do the
> bulk of the work.
>
> This table looks like:
>
> CREATE TABLE LARGE_TABLE (
> A_key bigint(20) unsigned NOT NULL
> auto_increment,
> B smallint(5) unsigned NOT NULL default '0',
> C datetime NOT NULL default '0000-00-00
> 00:00:00',
> D int(10) unsigned NOT NULL default '0',
> E bigint(20) unsigned NOT NULL default '0',
> F tinyint(3) unsigned NOT NULL default '0',
> G smallint(5) unsigned NOT NULL default '0',
> H bigint(20) unsigned NOT NULL default '0',
> I smallint(5) unsigned NOT NULL default '0',
> J bigint(20) unsigned NOT NULL default '0',
> K int(10) unsigned NOT NULL default '0',
> L bigint(20) unsigned NOT NULL default '0',
> M datetime NOT NULL default '0000-00-00
> 00:00:00',
> PRIMARY KEY (`A_key`),
> KEY C (`C`),
> KEY D (`D`),
> KEY G (`G`),
> KEY I (`I`),
> KEY K (`K`)
> ) TYPE {varied during testing}.
>
> Initially, we created the table as a MyISAM table
> and set MAX_ROWS=4294967295 - played with InnoDB
> when things started getting weird ( see post at
> http://forums.mysql.com/read.php?24,43227,43227#ms
> g-43227 for more details about the badness).
>
> Basically, at ~60 days with ~1.6 billion rows in
> LARGE_TABLE, the system became virtually unusable.
> After repeated crashes, the InnoDB tablespace was
> so corrupt MySQL would no longer start (we did
> manage to recover all of the data) and I
> re-approached the problem. The recovery literally
> finished this week and our new batched
> inserter/gatherer is _still 'set -x' in the cron
> job.
>
> We are now relying on MyISAM tables created daily
> on the local RAID 5 array (system was only
> supposed to write here for temp tables - long
> story) with PACK_KEYS=1 and myisampack every daily
> file over 3 days old (through the INSERT cronjob).
> I create a MERGE table for each monthly table
> (and tried and failed to cascade merges today -
> worth a shot, I guess back to unions of the MERGE
> definitions if those will go). Our data going
> back to early June topped 2.3 billion rows earlier
> this week and will break 2.4 tomorrow (probably).
>
>
> While this approach is working, the 127 table
> limit in the global MERGE definition is rapidly
> approaching and I'd really like to just continue
> accessing the data as a single table.
>
> We will soon begin developing the tablespace and
> queries for the mining of this data - and for that
> application, rapid response is far less important
> than being able to hit a vast amount of data.
> When Mr. DeSouza pointed me here from the post to
> the performance forum, the ranged partitioning
> looked like it may meet my needs well, if it will
> be possible to define partitions down to daily
> ranges :)
>
> It's tempting to burn another server and build 5.1
> from source, just to see if and how partitioning
> will perform for this application. I guess this
> has gone on long enough - I welcome any input that
> any of you can provide. Does this sound like an
> application/DB made for partitioning?
>
> If there's more information that you need, I will
> comply to the best of my abilities within the NDA
> that binds so many of us (myself included)...
>
> Thanks for any replies,
>
> -JNK

Mikael Ronstrom
Senior Software Architect, MySQL AB
My blog: http://mikaelronstrom.blogspot.com

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.