MySQL Forums
Forum List  »  InnoDB

Posted by: James Day
Date: January 28, 2005 04:45PM

We used to do it real time but load issues at peak times forced us to switch to batch updating. A few months ago slave load issues caused us to stop replicating the search table during those batch updates and now it's done on the master only and copied to offline slaves. To give some idea, a load table from master for all of our current article text takes about 3-4 hours to run on a dual Opteron slave when it's out of service. For some idea of volume, we see about 400,000 to 500,000 page changes per week. Using batch operations significantly cut the number of updates we needed to because the same page tends to be edited many times. The dual part in the Opteron doesn't seem to be of much value for this - seems to be single-threaded.

Today index updates are processed on the master via a cron job and I take pairs of slaves offline off peak to copy the tables across. We've pretty much outgrown MyISAM fulltext search and have been experimenting with other approaches, notably Lucene, which, with its index merging capability, may help to handle the update time issues we've encountered. On the other hand, most places aren't sites in the top 100 English language or 150 worldwide, with our volume of updates, so it's very capable for most users. We're definitely in the push it until it breaks category.

For our load, a 3GB P4 using a java-based Lucene engine handled load comparable to about 2-3 4GB dual Opterons with 6 disk RAID 10 sets. Assorted issues still to be resolved before we actually switch - technical and political - and it might not happen. Until then I plan to experiment with some non-replicating 3GB P4s and see how the lower disk load and fairly high RAM level lets them do compared to the main database servers. I expect them to do fairly well - better in cost-effectiveness than the standard database servers - but we'll see. May be glitches in our code which rules them out or my expectation that they will do fairly well may be disproved.

At present, search is about 50-70% of our database load and an ongoing scalability challenge because it's one of the things we don't cache. We're seeing an increasing tendency of people to come to us first. That's good for mindshare but not so good for an area where we're still addressing scalability...:)

Note also that our search query and schema has significant overhead not related to MySQL full text search - the full text portion is actually by far the faster part of the search. Fixes for this are planned in the next version of the MediaWiki software.

Options: ReplyQuote

Written By
January 24, 2005 07:13AM
January 24, 2005 01:10PM
January 25, 2005 01:37AM
January 26, 2005 07:50AM
January 28, 2005 04:34AM
January 28, 2005 04:45PM
February 01, 2005 07:04AM
February 28, 2005 12:54AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.