MySQL :: Re: Questions about performance for huge table (100M+ rows)

New Topic

Re: Questions about performance for huge table (100M+ rows)

Posted by: Vince Gatto
Date: January 03, 2009 01:17AM

1. Definitely use InnoDB. You should read up on the various configuration parameters that can be used to tune InnoDB performance. You should also understand using transactions, transaction isolation and how to avoid and handle deadlocks. Make sure you know what a clustered index is and the implications of choosing your primary key when using InnoDB.

2. If you're trying to do an exact comparison, FULLTEXT indexes are not what you want. Your hash will be way faster than a FULLTEXT index, as long as you create an index on your hash column. Also, InnoDB tables do not support fulltext search, so if you go with InnoDB, you'll have to stay away from FULLTEXT indexes.

3. Splitting up tables by rows (horizontal partitioning) is useful when you know the vast majority of queries will only need to access one of your tables. Otherwise, you're hurting performance and adding extra code complexity. This isn't something you should probably do up front until you've exhausted all other avenues of improving performance.

As far as splitting the information into another table (vertical partitioning), it really all depends on what your query load looks like. Most of the time, having a smaller, pure subset of another table doesn't really help. I wouldn't experiment with this ahead of horizontal partitioning

4. Go for the UNIQUE index. Not because of performance, but because it will perform a cheap integrity check of your assumptions. You only get 1:2^63 probability of collision if your code works perfectly. There's plenty of ways to screw up, so having a little safeguard never hurts. What you should do is use HASH indexes, rather than BTREE, since you'll have no need of range scanning when the data your indexing is opaque.

5. The problem with indexing everything isn't so much about the updating, though that can be a problem if you need high write throughput, its about how InnoDB caches index blocks in memory. All the indexes you create will be competing for memory. As you need to read new blocks, you have to push old blocks out of memory. If you're accessing many different parts of many different indexes, and you can't hold them all in memory, you'll end up with a lot of disk access. Be especially careful of VARCHAR columns, since they often force MySQL to create disk-based temporary tables (slow) instead of in memory. If your data is almost always the max length anyway, use CHAR instead. You'll waste a little space, but you'll be less likely to go to disk if you sort or group by that column.

I have no idea what your application is, but remember to really question your requirements. Do users really need to sort by everything? Do updates need to be reflected in real time, or can you batch them up and only update every 10 minutes or every hour? A really scalable system isn't just about tuning the software and operations, its also about tuning the product, and making sure the value being offered is worth the cost of offering it.

Hope that helps.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Questions about performance for huge table (100M+ rows)

12520

Vladan Stefanovic

January 02, 2009 08:20PM

Re: Questions about performance for huge table (100M+ rows)

4765

Rick James

January 03, 2009 12:38AM

Re: Questions about performance for huge table (100M+ rows)

3266

Boyd Hemphill

February 03, 2009 10:01AM

Re: Questions about performance for huge table (100M+ rows)

3295

Vince Gatto

January 03, 2009 01:17AM

Re: Questions about performance for huge table (100M+ rows)

2856

Rick James

January 03, 2009 04:58PM

Re: Questions about performance for huge table (100M+ rows)

2780

Vladan Stefanovic

January 03, 2009 06:02PM

Re: Questions about performance for huge table (100M+ rows)

2448

Rick James

January 04, 2009 07:35PM

Re: Questions about performance for huge table (100M+ rows)

2998

Vladan Stefanovic

January 05, 2009 09:53AM

Re: Questions about performance for huge table (100M+ rows)

2732

Rick James

January 05, 2009 06:49PM

Re: Questions about performance for huge table (100M+ rows)

3035

Vladan Stefanovic

January 05, 2009 09:21PM

Re: Questions about performance for huge table (100M+ rows)

2617

Rick James

January 06, 2009 10:00PM

Re: Questions about performance for huge table (100M+ rows)

2478

Victor Andres

January 21, 2009 11:50AM

Re: Questions about performance for huge table (100M+ rows)

2783

Rick James

January 21, 2009 11:04PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.