MySQL Forums
Forum List  »  Performance

Re: word index - is there a better solution?
Posted by: Craig Logan
Date: June 06, 2010 10:13AM

hey rick,

I took your advice and modified the table (added a second)

CREATE TABLE `words_in_docs` (
  `word_id` mediumint(7) unsigned NOT NULL default '0',
  `doc_id` int(9) unsigned NOT NULL default '0',
  `word_location` tinyint(1) unsigned NOT NULL default '0',
  `lang` char(2) character set ascii NOT NULL default 'en',
  PRIMARY KEY  (`lang`,`word_location`,`word_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 


CREATE TABLE `words_in_docs_sort` (
  `doc_id` int(9) unsigned NOT NULL default '0',
  `word_id` mediumint(7) unsigned NOT NULL default '0',
  `word_location` tinyint(1) unsigned NOT NULL default '0',
  `word_sort` smallint(5) unsigned NOT NULL default '0',
  `lang` char(2) character set ascii NOT NULL,
  PRIMARY KEY  (`lang`,`word_location`,`doc_id`,`word_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

well, I'm not sure about this. while I clearly see the benefit of having a smaller "lookup" table, I'm not sure it will pay of because I still have to have the other table and actually use a join or union to merge it into the first.

I guess it will depend on how I solve the problem with selecting th eproper wort sort. I had thought of along the lines of using concat to generate a string of the word sort numbers which I can check against the desired sort order, but there are probably too many problems associated with this.

do you have an idea for that?

############

I'm not yet sure about the stop words, but there will probably and interface that allows the removal of certain words or to map them to other words.

############

As in idea, I could drop the word sort table and instead make the software learn which search terms produce the most popular results and then just throw out the most popular results for the search word on top and teh rest just sorted by date or so. I think youtube is doing it like that.

############

cluster indeed comes with the problem that its intended to be used as an in-memory database. if you create lots of little mySQL clusters, however, you could get the fault tolerance desired. you just would have to live with the fact that you would only have like 128 or 256GB memory per cluster.

maybe it's possible, though, to "cheat" cluster by setting up servers with lots of swap space and just let the servers use that. no idea if cluster is smart enough to detect that, but if it works you could get pretty far if using a raid 0 of SSD drives.

############

I'm a caching fanatic. I thrill on getting any webpage process in <0.00x seconds.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: word index - is there a better solution?
1241
June 06, 2010 10:13AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.