Re: slow update-join query
> MySQL version? Amt of RAM available to MySQL? Does
> the machine swap during the query?
version 5.7.27, its running on a 16Gb Mac, I don't know if machine was swapping during the query but I didnt have much executing on the machine at the time.
> Let's see the results of ...
> show create table table1;
| table1 | CREATE TABLE `table1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`source_article_id` bigint(20) DEFAULT NULL,
`version` float DEFAULT NULL,
`current_version` tinyint(1) DEFAULT '0',
`cola` text COLLATE utf8mb4_unicode_ci,
`colb` text COLLATE utf8mb4_unicode_ci,
`colc` text COLLATE utf8mb4_unicode_ci,
`cold` text COLLATE utf8mb4_unicode_ci,
`cole` text COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (`id`),
KEY `said_idx` (`source_article_id`),
KEY `v_idx` (`version`)
) ENGINE=InnoDB AUTO_INCREMENT=147781426 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
> show create table table2;
| table2 | CREATE TABLE `table2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`source_article_id` bigint(20) DEFAULT NULL,
`max_version` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `said_idx` (`source_article_id`),
KEY `v_idx` (`max_version`)
) ENGINE=InnoDB AUTO_INCREMENT=33403055 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
I left the relevant column names as is and renamed some others.
Actual query is:
update table1 a, table2 t set a.current_version = 1 where a.source_article_id = t.source_article_id and a.version = t.max_version;
> explain analyze update table1 a, table2 t set
> a.col1 = 1 where a.col2 = t.col2 and a.col3 =
> t.col3;
id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 't', NULL, 'ALL', 'said_idx,v_idx', NULL, NULL, NULL, '32306234', '100.00', 'Using where'
'1', 'UPDATE', 'a', NULL, 'ref', 'said_idx,v_idx', 'said_idx', '9', 'bdx_build.t.source_article_id', '4', '1.10', 'Using where'
I was running some other test queries today, and after trying those the explain now looks a bit different than yesterday. The rows and filtered values have changed now - rows used to be 150mil+ for SIMPLE type (I might have misread that one, not sure how it can be more than the rows to update) and filtered used to be 4.88 for UPDATE type (this one I am pretty sure about).
What I tried was to add a filter condition on source_article_id
where a.source_article_id >= x and a.source_article_id < y
There are about 35 million source-article-ids in table1.
I tried a range of 10k, 1mil and 10mil and they took 0.5 secs, 40 secs and 24 minutes respectively. The 10 mil range updated about 34 mil rows. So I kind of updated 1/4th of the table in 24 minutes.
Thanks!