MySQL Forums
Forum List  »  Performance

Re: word index - is there a better solution?
Posted by: Rick James
Date: June 05, 2010 12:41PM

"+ an index on each single column " -- this is usually wrong. Please do
SHOW CREATE TABLE (instead of DESCRIBE)
so I can see the actual indexes and the engine. Compound indexes are usually better. Of course, it depends on the SELECTs, so let's see them.

My mantra for large scale databases: "count the disk hits".

If you have a 40TB table with 5 indexes, that is probably about 5 disk hits for every INSERT. On ordinary drives, you would be lucky to INSERT any faster than 20/sec. Let's see, one trillion inserts at 20/sec 30M sec/yr -- That would take about 1600 YEARS to populate the table. We need to talk seriously about how this will be done. AND, minimize the number of indexes on that table! AND let's discuss each field to see if it is really needed and/or can be shrunk. For example, language... How many different languages? Probably few enough to fit into a TINYINT UNSIGNED (256); that would save 2 bytes per row. Maybe you can get rid of the language field by having a different table for each language (saving 4 bytes if InnoDB).

As for SSDs, check out Verident; they have about the fastest around. But there current projection is only 1TB per machine.

If you use SSDs, traditional wisdom about elevator strategies and file systems go out the window.

Oh, with SSDs, your 40TB's worth of inserts will take only a century.

MyISAM would be 2x-3x smaller for that kind of table. What it the PRIMARY KEY? If you use InnoDB, explicitly state a PK, and pick it wisely. If you don't have one, InnoDB will tack on a 6-byte integer for you. And it will be tacked onto every secondary key!

Some benchmarks show more than 50% improvement (in stress tests) when using the plugin.

"Sharding". Can this data be split among multiple machines? I was impressed with that talk given by ___ (don't remember) at the Conf. They have a federation/partition setup that looks workable. That is, they promise sharding with minimal effort on your part.

PARTITION is another option, but test carefully -- in casual use, it fails to provide any performance improvement.

Scaling is my passion. Usually I encounter the simple case of data getting bigger than cache and suddenly (and mysteriously to the designer) falling off a performance cliff. Your app carries the issue an order of magnitude farther. Interesting.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: word index - is there a better solution?
1414
June 05, 2010 12:41PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.