MySQL Forums
Forum List  »  Partitioning

Huge Dataset (perhaps?) made for Range Partitioning (Long)
Posted by: JinNo Kim
Date: September 21, 2005 09:46AM

Edwin DeSouza wrote:
> JNK,
> Your scenario is interesting/challenging enough to
> have its own thread.
>
> Can you start a new thread here with more details
> about your scenario. That will allow us to
> explore various Partitioning options to see which
> best fits your scenario.
>
> - Edwin
>

I'm new to these forums in general, primarily because I've never been this challenged by a database requirement. Three years ago, we convinced our IT department head to let us "test" a webapp deployment using Tomcat+MySQL, in lieu of Oracle (dedicated Java+Oracle shop until then). We've had huge successes with our internal applications and migrating more of our day-to-day functions to MySQL. My team and I have been able to research our way out of most of the small bumps in the road, but our most recent project (thankfully was still in beta) slammed up against some serious walls fast. Now for the fun...

We were given storage space on a NAS solution accessed via NFS, and decided to pursue a challenge that we'd been toying with for a long time - centralizing the storage of millions of transactions a day (10-30Million seems to be pretty average after 90 days of operation - low of 6.9M high of 60M). We prepared our remote applications to output hourly batched insert statements, compress, retrieve and insert into a new database at our datacenter. By thinking through the database design pretty carefully, we end up with only one table that needs to do the bulk of the work.

This table looks like:

CREATE TABLE LARGE_TABLE (
A_key bigint(20) unsigned NOT NULL auto_increment,
B smallint(5) unsigned NOT NULL default '0',
C datetime NOT NULL default '0000-00-00 00:00:00',
D int(10) unsigned NOT NULL default '0',
E bigint(20) unsigned NOT NULL default '0',
F tinyint(3) unsigned NOT NULL default '0',
G smallint(5) unsigned NOT NULL default '0',
H bigint(20) unsigned NOT NULL default '0',
I smallint(5) unsigned NOT NULL default '0',
J bigint(20) unsigned NOT NULL default '0',
K int(10) unsigned NOT NULL default '0',
L bigint(20) unsigned NOT NULL default '0',
M datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`A_key`),
KEY C (`C`),
KEY D (`D`),
KEY G (`G`),
KEY I (`I`),
KEY K (`K`)
) TYPE {varied during testing}.

Initially, we created the table as a MyISAM table and set MAX_ROWS=4294967295 - played with InnoDB when things started getting weird ( see post at http://forums.mysql.com/read.php?24,43227,43227#msg-43227 for more details about the badness).

Basically, at ~60 days with ~1.6 billion rows in LARGE_TABLE, the system became virtually unusable. After repeated crashes, the InnoDB tablespace was so corrupt MySQL would no longer start (we did manage to recover all of the data) and I re-approached the problem. The recovery literally finished this week and our new batched inserter/gatherer is _still 'set -x' in the cron job.

We are now relying on MyISAM tables created daily on the local RAID 5 array (system was only supposed to write here for temp tables - long story) with PACK_KEYS=1 and myisampack every daily file over 3 days old (through the INSERT cronjob). I create a MERGE table for each monthly table (and tried and failed to cascade merges today - worth a shot, I guess back to unions of the MERGE definitions if those will go). Our data going back to early June topped 2.3 billion rows earlier this week and will break 2.4 tomorrow (probably).

While this approach is working, the 127 table limit in the global MERGE definition is rapidly approaching and I'd really like to just continue accessing the data as a single table.

We will soon begin developing the tablespace and queries for the mining of this data - and for that application, rapid response is far less important than being able to hit a vast amount of data. When Mr. DeSouza pointed me here from the post to the performance forum, the ranged partitioning looked like it may meet my needs well, if it will be possible to define partitions down to daily ranges :)

It's tempting to burn another server and build 5.1 from source, just to see if and how partitioning will perform for this application. I guess this has gone on long enough - I welcome any input that any of you can provide. Does this sound like an application/DB made for partitioning?

If there's more information that you need, I will comply to the best of my abilities within the NDA that binds so many of us (myself included)...

Thanks for any replies,

-JNK

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.