MySQL Forums
Forum List  »  Full-Text Search

Re: Multiple Databases - Search and retrieve.
Posted by: Rick James
Date: July 12, 2011 12:23AM

90% of database systems have fewer than 10 databases and fewer than 200 tables (across all databases). (Disclosure: Those numbers are guesstimates.)

A system with 70K databases or 70K tables is "not designed right".

> Using PHP 5.2.17 and MySql 5.0, can have multiple DBs but of fixed(limited) size.
Yeah, the limit is something like 64 TERABYTES. PER TABLE! Where did you get "but of fixed(limited) size"?

Once you get over a terabyte, you get into performance issues. But they are usually founded in disk speed, not MySQL limitations.

> I want to search using "LIKE '%ab%'"
That's an awful query. MySQL can't do it efficiently; it will essential have to read every row, and scan the column in question.

On the other hand, if you are really searching for words in text fields, FULLTEXT does a much better job. (See also Lucene and Sphinx.)

A common technique for solving otherwise-intractable database problems is to "meet in the middle". That is, do some extra work when INSERTing the data, so that the table and index are much easier for the SELECTs to use.

I mentioned one way of doing that -- collect all the text into a single table, just for this search query.

Another example is FULLTEXT, itself. When you INSERT, it parses the text, stuffing indexing info for each word it finds. The SELECT then uses this pre-digested info.

If you really need to search for 'ab' anywhere in a string, then I would look into digraphs, and create bit strings for each record. Each bit represents one pair of letters. (This is very sketchy, and would take a lot of work on your part to implement.)

Another way to make an intractable problem more tractable is to kill off some of the requirements. FULLTEXT requires that -- it will only find "words", not parts of words, and it does not want to find words shorter than 4 characters, nor words in its 'stoplist'. (These restrictions can be overcome--at a cost.) FULLTEXT effectively says "You don't need to look for the words 'a', 'and', and 'the'."

Another compromise on the "requirements"... You mentioned "using limit(15) and sql_calc_found_rows in the same querry". Have you ever noticed that search engines say "rows 1-10 out of about 1,230,000"? They do not calculate the actual number; that would be prohibitively expensive. They "meet in the middle" by, say, calculating estimates daily for the common words. That's "good enough", and very fast for the end-user.

I have counseled many a MySQL user who 'paginates' their web pages with OFFSET and LIMIT, only to find miserable performance. I tell them that they don't really need to "jump to page 9876", all they really needs is "Next" and "Prev" pages (or maybe the next/prev few pages). Then I show them how to do it in Order(1) instead of Order(N) time. How often do you go past the first few pages?

Data Warehousing "reports" meet in the middle by having "summary" tables.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Multiple Databases - Search and retrieve.
3327
July 12, 2011 12:23AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.