Re: Load Data Infile takes a long time
RAM => 94G
Had to partition by month since its easier to TRUNCATE an entire month of data easily. I usually do it for last 3 months. Maybe can join all earlier partitions together.
table_open_cache => 2048
Planning to change the table definition to look like this. Including 'date' as well in the primary key section so that I can partition on date:
CREATE TABLE `mobile_metrics_daily_partitioned_test` (
`date` date DEFAULT NULL,
`project_id` int(11) DEFAULT NULL,
`age_group_id` int(11) DEFAULT NULL,
`gender_id` int(11) DEFAULT NULL,
`country_id` int(11) DEFAULT NULL,
`language_id` int(11) DEFAULT NULL,
`user_type_id` int(11) DEFAULT NULL,
`channel_id` int(11) DEFAULT NULL,
`dim1_id` int(11) NOT NULL DEFAULT '0',
`dim2_id` int(11) NOT NULL DEFAULT '0',
`metric_id` int(11) NOT NULL,
`value` float DEFAULT NULL,
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`last_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`platform_id` int(11) DEFAULT '0',
PRIMARY KEY (`id`, `date`),
UNIQUE KEY `date_2` (`date`,`project_id`,`age_group_id`,`gender_id`,`country_id`,`language_id`,`user_type_id`,`channel_id`,`dim1_id`,`dim2_id`,`metric_id`,`platform_id`),
KEY `metric_id` (`metric_id`),
KEY `project_id` (`project_id`),
KEY `date` (`date`),
KEY `game_metric_date` (`project_id`,`metric_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE ( TO_DAYS(date))
(PARTITION p_200701 VALUES LESS THAN (TO_DAYS("2007-02-01")),
PARTITION p_201212 VALUES LESS THAN (TO_DAYS("2013-01-01")) ,
PARTITION p_201301 VALUES LESS THAN (TO_DAYS("2013-02-01")) ,
PARTITION p_201302 VALUES LESS THAN (TO_DAYS("2013-03-01")) ,
.
.
.
PARTITION p_201411 VALUES LESS THAN (TO_DAYS("2014-12-01")) ,
PARTITION p_201412 VALUES LESS THAN (TO_DAYS("2015-01-01")) ,
PARTITION p_ALL VALUES LESS THAN MAXVALUE);
Not changing the gender_id stuff for now, but will do that soon.
Typical Select queries would be looking for date, project_id & metric_id. There would be aggregation done as well.
Anything else that I need to take care before I make a huge load-data-infile run?