MySQL Forums
Forum List  »  Performance

Re: 11 millions records : more than 30 seconds to retrieve simple request
Posted by: Rick James
Date: January 05, 2011 08:53AM

The two main ways to speed up INSERTs of lots of rows:

* Create a csv file, then use LOAD DATA INFILE

* 'Batch insert': INSERT INTO foo (a,b) VALUES (1,2), (3,4), (5,6), ...
I recommend batches of 100. That's enough to give you about 10x speedup; more than 100 is into diminishing returns.

There are several reasons why InnoDB may have been slow.
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
autocommit = 1
It is best to batch inserts in medium sized chunks per "transaction". (Small batches -> lots of transaction overhead; huge batches -> overflow transaction log)

Another way to get a monthly batch insert into InnoDB: Load into MyISAM, perhaps without any indexes, then ALTER TABLE with ENGINE=InnoDB and ADD PRIMARY KEY and any other indexes. Loading MyISAM without any indexes will write about as fast as one can write a file. The ALTER will do the rest of work all at once, and reasonably efficiently. I think no transactions will be involved.

Is your monthly upload a replacement? Or new data for an existing table? The previous paragraph assumes a replacement. If it is adding to the existing table, then the batch insert techniques apply.

(I'm not saying you should switch to InnoDB, but merely giving you the tools to may it a success if you care to try. I cannot predict that it will necessarily be better for your application as a whole.)

SHOW TABLE STATUS;
This will give you the data and index sizes of the tables. Is this one table the bulk of what you have? The index size (for MyISAM) compares to
key_buffer_size = 128M
If the buffer is bigger than the index size(s), the indexes will (eventually) be fully cached.

For a 4GB machine and 32-bit MySQL, I would not go past 700M for key_buffer. And make sure the OS will let a single process get bigger than that.

For 4GB and 64-bit MySQL, 1200M might be a reasonable limit.

If you switch to InnoDB,
key_buffer_size = 20M
innodb_buffer_pool_size = 1500M or 2500M (32/64)

If you are loading once a month, does it really matter that it takes 20 hours? Sure, networking and multiple machines would probably let you do the INSERTs in parallel and shrink the 20 hours. But, I suggest it is complexity that you don't need to bother with (in this case).

Let's look at "variable". What is the average length? How many distinct values are there? If it is long and there are not many different values, then "normalization" may be indicated. That would involve an extra table with an AUTO_INCREMENT number (perhaps a MEDIUMINT UNSIGNED) and "variable". Then the main table would have that id instead of variable. (Smaller --> ...)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.