Re: Mass Import Innodb
> innodb_flush_log_at_trx_commit=2
I use value 2 to reduce i/o.
> That is, >99% of reads were satisfied without hitting the disk.
95% according to innotop
>With a battery-back-write-cache? If so, then writes are virtually free.
No unfortunately
>MySQL will not use much more than 1GB.
It is not a dedicated mysql server. I run more services on it. Are there any settings on conf that could cause problems?
>Fastest loading methods...
I'm aware that there are several times faster but I need to change the IDs of the records on the fly. As a result I can't use LOAD DATA FILE or INSERT INTO ... VALUES.
By using the SQL that I posted, I copy the data from the oldDB.documentlist to the newDB.documentlist and I change the value of the id_document. The new values of the ID are stored on temp_docids table. The reordering must take place, since it improves dramatically the speed of the selects on the new DB (the documents are sorted by popularity).
Here are the 3 tables. You will notice that documentlist table schema is the same in both DBs.
CREATE TABLE `oldDB`.`documentlist` (
`id_document` int(10) unsigned NOT NULL,
`name` varchar(255) DEFAULT NULL,
`longHTMLdescr` blob,
`fileData` mediumblob,
`lastupdated` datetime DEFAULT NULL,
`status` tinyint(3) unsigned DEFAULT NULL,
`code` smallint(5) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;
CREATE TABLE `newDB`.`documentlist` (
`id_document` int(10) unsigned NOT NULL,
`name` varchar(255) DEFAULT NULL,
`longHTMLdescr` blob,
`fileData` mediumblob,
`lastupdated` datetime DEFAULT NULL,
`status` tinyint(3) unsigned DEFAULT NULL,
`code` smallint(5) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;
CREATE TABLE `newDB`.`temp_docids` (
`id_document` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_document_old` int(10) unsigned NOT NULL,
PRIMARY KEY (`id_document`) USING BTREE,
KEY `Index_id_document_old` (`id_document_old`,`id_document`),
KEY `Index_id_document_old2` (`id_document`,`id_document_old`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
The temp_docsids stores the mapping between the new and the old ID. That's the schema of the latest version that I use. I added also 2 indexes just for testing but it does not prefer to use any of them. Forcing them has the same effect.
I have ensured the following: the Hard Disk is not used at all (10-15MB/s reads&writes), CPU utilization is very low and even when memory is available to the sytem MySQL does not use it (probably due to the innodb_buffer_pool_size Value).
Finally as I found out the mapping between old and new IDs looks like this:
new Old
1 => 5012
2 => 5014
3 => 100000
4 => 12500000
5 => 12500001
6 => 3
...
In a worst case scenario this could lead to reading 1 disk page per table row, which is slow. Nevertheless even in this case, I am able to copy at the same time huge files to the hard disk extremely fast (200-500MB/s). This shows that the problem is not on the HD but rather on MySQL configuration/query/or similar.
Any clues at all?