MySQL Forums
Forum List  »  Partitioning

Re: Partioning based on string column
Posted by: Allan Winston
Date: August 11, 2009 08:02PM

Yogesh,

I have just been using MySQL for a couple of weeks for a home project after having done performance work with the DB2 database a few years ago.

I found your question interesting, because it bears on the database I am designing for the stock market where the stock ticker is the first part of the key. After being discouraged by the size of my MyISAM files, I decided to experiment with the ARCHIVE engine. I am getting about 22% better compression than I got with the MyISAM engine with the best compression I could achieve.

Switching to the Archive engine, I was experimenting with a database of 5.6 million rows, with an input data size of 630 MB for the CSV file to be processed by LOAD DATA INFILE. The unpartitioned load created a table of about 181 MB. I then told it to partition on the ticker:

ENGINE=ARCHIVE

PARTITION BY KEY(Ticker)
PARTITIONS 64

Due to the vagaries of file allocation, the total size in this instance grew slightly to 188 MB, spread over the 64 partitions. The partitions were not of equal size, which I anticipated. The partitions varied in size from 0.8 MB to 4.7 MB.

My expectation was that even though the rows were inserted into various partitions that I would still not get effective partition pruning, since I was not partitioning on an integer. I was proved wrong by the EXPLAIN PARTITIONS and by the actual running times. For example, the statement
SELECT COUNT(*) FROM table_name WHERE ticker = 'AAPL" ran in 30.65 seconds on the unpartitioned table while it ran in 1.17 seconds on the partitioned table.

It seems to me the best solution for your application would be to add a new column to your table that contains the first two letters of the last name and make that column the partitioning column. This will make the queries much more user-friendly by not requiring the use of an artificial integer "code".

As a retired performance analyst, I would recommend trying to distribute the location of these partitions over multiple disks, if that option is available. That way, you can have multiple queries operating at the same time, much of the time on different packs and avoiding seek contention dring concurrent queries on the last name. If your disk environment is RAID, then things get more complicated and you will need to consult someone familiar with your shop's DASD layout.

While I chose 64 partitions, there is nothing magical about that choice. You will need to choose a number based on your database size and DASD availability.

Regards,
Allan

Options: ReplyQuote


Subject
Views
Written By
Posted
3288
August 07, 2009 04:17PM
Re: Partioning based on string column
2759
August 11, 2009 08:02PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.