>Do you have blobs? Can you compress them? That will save a significant amount of I/O (and space).
I have 2 blobs in the table, one medium and one small. I use compression and indeed it REALLY helps I/O and space. Unfortunately though I have read several articles that mention that by compressing the tables you increase the insertion time.
>It is unclear whether you should sort before inserting or insert randomly. Need to see the details.
I sort before inserting because this heavily affects the speed and reduces I/O when the table gets bigger. In the past I used to insert randomly and after a while the speed of INSERTs decreased a lot. Also the empty space of the innodb table files heavily increased! That's why now I insert the records sorted by the new ID. As you'll see below this leads to "random" reads from the old table, which affects the speed.
Here is the table schema:
CREATE TABLE `documentlist` (
`id_document` int(10) unsigned NOT NULL,
`name` varchar(255) DEFAULT NULL,
`longHTMLdescr` blob,
`fileData` mediumblob,
`lastupdated` datetime DEFAULT NULL,
`status` tinyint(3) unsigned DEFAULT NULL,
`code` smallint(5) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;
I was about to provide you the INSERT..SELECT statement but after testing I have found that a particular segment of the SELECT part of the statement is the slow one. Here is a simplified version of the statement that is executed slowly (40 seconds). The whole INSERT...SELECT is executed in 41 secs so that's the problematic part:
SELECT `id_document`,`name`
FROM `oldDB`.`documentlist`
WHERE `id_document` in (15371963,14864105,13406499,13230248,16477007,10207882,14408890,.......);
inside the in() there are 1000 IDs. I am forced to select them with this particular order in order to do the inserts on the new table in a sorted manner. Selecting them in a different way is not an option (I've heavily tested this). I am sure that the above records are stored in different disk pages and this leads to slow execution. Still though HD is not used at all and it is able to perform reads at 500MB/s for other tasks at the same time. That's why I believe that it is a problem of the memory or buffers. Also CPU usage is very low.
Here is the explain of the above query. As you see the PRIMARY key is used normally:
1, 'SIMPLE', 'documentlist', 'range', 'PRIMARY', 'PRIMARY', '4', '', 1000, 'Using where'
Here are the buffers:
'bulk_insert_buffer_size', '4096'
'innodb_buffer_pool_instances', '1'
'innodb_buffer_pool_size', '1074790400'
'innodb_change_buffering', 'all'
'innodb_log_buffer_size', '10485760'
'join_buffer_size', '1048576'
'key_buffer_size', '134217728'
'myisam_sort_buffer_size', '1048576'
'net_buffer_length', '65536'
'preload_buffer_size', '32768'
'read_buffer_size', '2097152'
'read_rnd_buffer_size', '2097152'
'sort_buffer_size', '1048576'
'sql_buffer_result', 'OFF'
I really don't understand what's wrong. If it is not a problem on the buffers (most likely), then it's probably a bug?