MySQL Forums
Forum List  »  Performance

Poor scaling, odd performance behavior
Posted by: Mark Modrall
Date: October 24, 2005 01:51PM

Hi...

I've got an app that's trying to aggregate a lot of log data from across a number of daily log tables. The tables it's aggregating into are then intended to produce various reports.

To try and reduce the sheer number of bytes in the aggregates, I put the longer text elements that could possibly be repeated into one table with an AUTO_INCREMENT primary key, and then put the associated statistics into other tables based on the primary id column. The upside is that if a bunch of people enter the same "blah blah blah" in the log over and over, I'll only have one copy in the aggregate.

I got advice here earlier about using an md5 hash for the unique index on the user input, and that helped some, but I'm finding as the aggregate tables grow (at this point I'm up to 16M rows), performance really goes to the toilet.

My app does a
SELECT SQL_BIG_RESULT input, user, count(user) FROM logday_<day> GROUP BY input, user order by NULL;

then it reads the results and
insert input, md5(input) into main_table ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);
insert id, user, count(user) into stats_table;

for each of the results from a given day. The program loops over as many days as you tell it to do at once.

As the main_table gets bigger and performance degrades, I also notice the odd behavior that the first day in the loop performs a bit more than twice as badly as other days in the run, even though they're using the same Connection objects. In other words, if all days have about ~400,000 records to aggregate, the first day will take 2 hours to load while subsequent days will take a little under an hour. Taking an hour to insert 400,000 records is already a long time, but for some reason the first one takes over 2.

I'm using MySql 4.1.12 and .Net Connector 1.0.6. The client side is running on a sizable Windows XP box while the server side is a quad proc linux box with 1GB of ram and a 10-disk raid for the storage.

Anyone have any ideas? Also, if I were to make a covering index on (MD5(input), id), would that help the performance of ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)? I would guess that it would, but I don't think you can get an EXPLAIN on that.

Thanks
_mark

Options: ReplyQuote


Subject
Views
Written By
Posted
Poor scaling, odd performance behavior
1725
October 24, 2005 01:51PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.