MySQL Forums
Forum List  »  Performance

Re: Questions about performance for huge table (100M+ rows)
Posted by: Vince Gatto
Date: January 03, 2009 01:17AM

1. Definitely use InnoDB. You should read up on the various configuration parameters that can be used to tune InnoDB performance. You should also understand using transactions, transaction isolation and how to avoid and handle deadlocks. Make sure you know what a clustered index is and the implications of choosing your primary key when using InnoDB.

2. If you're trying to do an exact comparison, FULLTEXT indexes are not what you want. Your hash will be way faster than a FULLTEXT index, as long as you create an index on your hash column. Also, InnoDB tables do not support fulltext search, so if you go with InnoDB, you'll have to stay away from FULLTEXT indexes.

3. Splitting up tables by rows (horizontal partitioning) is useful when you know the vast majority of queries will only need to access one of your tables. Otherwise, you're hurting performance and adding extra code complexity. This isn't something you should probably do up front until you've exhausted all other avenues of improving performance.

As far as splitting the information into another table (vertical partitioning), it really all depends on what your query load looks like. Most of the time, having a smaller, pure subset of another table doesn't really help. I wouldn't experiment with this ahead of horizontal partitioning

4. Go for the UNIQUE index. Not because of performance, but because it will perform a cheap integrity check of your assumptions. You only get 1:2^63 probability of collision if your code works perfectly. There's plenty of ways to screw up, so having a little safeguard never hurts. What you should do is use HASH indexes, rather than BTREE, since you'll have no need of range scanning when the data your indexing is opaque.

5. The problem with indexing everything isn't so much about the updating, though that can be a problem if you need high write throughput, its about how InnoDB caches index blocks in memory. All the indexes you create will be competing for memory. As you need to read new blocks, you have to push old blocks out of memory. If you're accessing many different parts of many different indexes, and you can't hold them all in memory, you'll end up with a lot of disk access. Be especially careful of VARCHAR columns, since they often force MySQL to create disk-based temporary tables (slow) instead of in memory. If your data is almost always the max length anyway, use CHAR instead. You'll waste a little space, but you'll be less likely to go to disk if you sort or group by that column.

I have no idea what your application is, but remember to really question your requirements. Do users really need to sort by everything? Do updates need to be reflected in real time, or can you batch them up and only update every 10 minutes or every hour? A really scalable system isn't just about tuning the software and operations, its also about tuning the product, and making sure the value being offered is worth the cost of offering it.

Hope that helps.

Options: ReplyQuote

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.