MySQL Forums
Forum List  »  Full-Text Search

Re: Scale question FTS, for planning purposes
Posted by: Rick James
Date: May 02, 2011 08:22PM

(Insert back of envelope...)

1B rows, 250 chars each, plus FULLTEXT index. That will be pushing 1TB of disk.
Conclusion: Virtually any query will have to hit the disk.
FT is likely to need to poke around quite a bit.
(writing on envelope) 10 disk hits per second.
Times 10ms per disk hit (unless you have RAID or SSD)
(pulling out calculator) That's 0.1 sec per row.
A billion rows.
(more calculator) That's 3 years to check each row.

(new envelope) Let's see how long it will take to insert the billion rows.
250 chars / 5 chars = 50 words per row.
That's 50 probes into the FT index's BTrees.
But, let's say that 30 of them are so common that they stay in cache.
Now we are at 20 disk hits per row inserted.
(replacing batteries in calculator; never mind, it is solar powered) 6 years to insert a billion rows.

Although this analysis was centered around MySQL and FULLTEXT, the number of disk hits is going to be somewhat similar for any approach you take. To achieve your goal (before you bore of it), you will need to distribute the data among many machines, and develop fancier algorithms than any off-the-shelf software can provide. Look at how many thousands of machines and engineers that Google, Yahoo, and Microsoft each have dealing with the "search" problem.

Options: ReplyQuote

Written By
Re: Scale question FTS, for planning purposes
May 02, 2011 08:22PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.