Re: Another High CPU Usage Problem (long post!)
Hi Jay,
Thanks very much for your response, as usual it is very informitive and insightful!
1.
I'll look into that IO issue, thanks for pointing that out.
Good point about separating the application into a number of separate subsystems. I'd recently been considering doing more "batch work". I imagine that, for example, if article inserts don't take place until after spidering, then the query cache will receive more hits as it isn't being invalidated.
It may also create more opportunities for scaling, since these independent processes can run on separate machines perhaps. Luckily the code is quite cleanly separated with regard to various subsystems, it's just that everything is taken care of on a "one-by-one" basis without batching.
2. Yes, I suspected that the RLIKE was causing many full table scans! The rule here is that if we've got an article, then we're not interested in subsequent updates. Many news sites put the word "(updated)" in the headline when they re-publish an article. As you say, perhaps I could calculate this at storage time. It could mean running the regex in application code, and setting a flag such as "article_is_update".
Again, thanks for your feedback, it really is *very* much appreciated.
Tobin