Re: A lot of rows for a very specific task
Posted by: Rick James
Date: January 19, 2009 12:01AM

MyISAM would take 16GB for the table. An index on (a) would take another 12-16GB. Alternatively an index on (a,b,c) would take about 25GB, but the SELECT could be done without touching the data.

Since you say the data is static, I would go with the (a,b,c) index. This index would be the "working set" of about 25GB.

InnoDB MUST have a PRIMARY KEY. If you don't provide one, a BIGINT will be provided for you. The data (and PK) would total about 100GB. Then an index on (a) or (a,b,c) would take a similar amount of space as for MyISAM.

If (a,c) is UNIQUE, then it could be the PK for InnoDB. Then you would not need a secondary key, and the total space would be 100GB. And that would be the size of the working set.

If you have, say, a 8GB of RAM and allocate 6GB for caching, that's about 1/4 of what is needed for MyISAM, or about 1/16 of what is needed for InnoDB. MyISAM wins.

Now, if your queries are _random_, then you will be disk bound. About 1/4 of the time (assuming MyISAM and 6GB), the necessary block(s) will be in cache.

So, 10K queries, 1/4 of which are in RAM -> 7500 disk hits in "serveral seconds". Let's say 10 sec. Now you need disk drives that can read at 750/sec. SATA -- no way. RAID+SCSI? If you can get a 14-disk raid system striped, you might sustain your required rate. This assumes you can do lots of queries in parallel.

Other notes... BTree is usually the best, and usually your only choice in MySQL.

Operating system -- does not matter much.

Cluster. I don't know the formulas for space in Cluster. The indexes all have to stay in RAM. So, if you can configure a Cluster with 50(?) GB of RAM among the machines, it should work. And it should not(?) hit the disk, so my I/O comments don't apply.

Alternatively, there are some vendors working on Flash memory, even up to 1TB, for a single machine. $$$

Another approach -- Multiple machines, with the data sharded (partitioned) on "a" among the machines. This would require your application to 'compute' which MySQL box to hit. Each box would have a subset of the data, and arrange for it all to be cached.

Options: ReplyQuote

Written By
Re: A lot of rows for a very specific task
January 19, 2009 12:01AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.