MySQL Forums
Forum List  »  Performance

Re: Questions about performance for huge table (100M+ rows)
Posted by: Rick James
Date: January 03, 2009 12:38AM

* Change "hash" to just VARBINARY. Utf8 is overkill. Checking sha1 for dup "content" is practical. Using FULLTEXT is not practical--for that purpose.

* IP -- ip address? Make it VARBINARY instead of utf8.

* Ditto for name, since it is base-36.

* InnoDB will be 3x bigger than MyISAM.

* Adding columns later -- this requires an ALTER TABLE, which has to copy the table over and recreate the indexes. This may take hours for a multi-GB table.

* What is the average size of "content"? It _may_ be better to put it in a parallel table. Especially if you do lengthy SELECTs that don't need it. Wide, little needed fields is my first choice for "spliting" (vertical partitioning) a table.

* Horizontal partitioning (eg, splitting on first char of hash)T is of little use; don't bother.

* At 1-2 queries/sec, MySQL is sleeping. 1000/sec would raise an eyebrow. OTOH, if the SELECTs or UPDATES hit lots of rows, it could be worse.

* Table locks are not an issue until you get hundreds of queries/sec.

* How much RAM? At some point, your 3 indexes will be thrashing disk I/O. Why? You will be hitting 3 different places on each INSERT. The PK will be clumped at the end of the table (auto_increment); the other two will probably be disk hits. But if you are talking about a few INSERTS/sec, no problem. 50/sec might max out the disk.

* How many rows will you SELECT/UPDATE at a time? How complicate is the WHERE? Why is it a SELECT plus UPDATE?

* I don't think your comments about avoiding UNIQUE are valid.

* Index all the fields? NO. First, indexing extra fields does not necessarily help SELECTs. Second, (as already mentioned), each index probably means a disk hit during INSERT and UPDATE (if that field changes). Show us some of the SELECTs you anticipate. If your WHERE clause picks 100 rows, and then you have ORDER BY, don't worry about indexes for performance. OTOH, if the WHERE picks 100K rows, having the ORDER BY field in the index _might_ help a lot. But note -- "in the index", that is, as a compound index, not as a separate index.

* If you are doing SELECT * FROM tbl WHERE special=1 ORDER BY xxx LIMIT 1000,10, you are in trouble.

* If "name" is unique, why have "id"? Is there some other table? Do you need to prune this table?

* (tip) Is "updated" a timestamp of when you change the row? Take a look at TIMESTAMP; in particular its default.

* (tip) If you go with MyISAM, read about MAX_ROWS in CREATE TABLE. (Otherwise, you will probably hit a default size limit.)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.