MySQL Forums
Forum List  »  Performance

Re: Demanding Query
Posted by: Tobin Harris
Date: July 24, 2005 11:44AM

Hi Jay,

Your thoughts are much appreciated, thank you. Some comments...

I added the body text hash column this afternoon, and have finally managed to get something that is performant. Woohoo! I agree that 200 might be enough for deduping purposes, I'll put that to my client. Unfortunately we don't have sophisticated enough parsing to get at author, article and date, but I agree this would be good deduping criteria. You hit the nail on the head though - we're basically trying to eliminate syndicated content with our body text match.

Interestingly, when I was analysing the query, I found that thing were slow even once I got it using indexes (using union queries and hash fields). The *main* time eater was on the following clause...

"or ( body_hash = md5(left(?bodytext,900)) )"

If I pre-calculated the "md5(left(?bodytext,900))" in a separate statement, rather than inside the main query, the performance was much much better. Query time went from 6.5 seconds to 0.3 seconds following this tweak. I wandered if MySql was evaluating it for each row?

Your thoughts on funneling were interesting, and I'm pleased you said that because it's actually something we do already. Always good to hear someone confirm your own opinion! We have an initial dedupe check, and then a final one before the insert. We don't do it in a cron job though - new articles go into an in-memory queue and are inserted by background processes that chomp away at the queue continually.

Having used MSSQL 2000, I do miss the depth of information you can get explaining queries. MySql has a very simple explain plan. Also, MSSQL can use multiple indexes in a single query, which is quite handy. It will be interesting to see what MySql 5.0 brings!

Thanks for thoughts on InnoDB vs MyISAM. I think it will be worth trying - we don't do many updates or deletes after the original insert. We *dp* lots of SELECTS and INSERTS simultaneously, but from what you say this might be handled well my MyISAM.

Again, thank you very much for you help on this.

Tobin

Options: ReplyQuote


Subject
Views
Written By
Posted
2519
July 23, 2005 11:31AM
1786
July 23, 2005 01:52PM
1690
July 24, 2005 05:35AM
1655
July 24, 2005 09:39AM
Re: Demanding Query
1720
July 24, 2005 11:44AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.