Hi all
I am seeing a strange performance behaviour with partitioned tables. I have tested this thoroughly with MariaDB 10.1.16 but have also experienced a similar issue on Mysql 5.5.44 (altough I have not yet executed the tests below).
In short, the issue is that a SELECT COUNT(*) query over the exact same dataset takes longer to execute in a partitioned table, even when all the data is in just one partition. In my real-world scenario, I am seeing a performance degradation of around 3 times.
To my understanding, if all rows reside in a single partition, the performance of such a query should be almost identical whether or not the table is partitioned. However I am noting considerable differences, with the query on the partitioned table always taking longer to execute.
From the query plan, I can see that MySql tries to use the index with the lowest cardinality (both in the unpartitioned and partitioned cases) - this is understandable since most probably InnoDB will simply traverse the smallest index and count the rows that way. When forcing the query to use the Primary Key index, there does not seem to be a difference between the partitioned and unpartitioned table.
Below are the steps to replicate the problem being encountered on a smaller scale. (Note that the bash script to generate data will need around 2GB disk space):
Unpartitioned Table:
CREATE TABLE `my_table` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`class` enum('a','b','c','d','e','f') NOT NULL,
`comment` varchar(30) NOT NULL DEFAULT 'Sometext',
PRIMARY KEY (`id`),
KEY `idx_b` (`class`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Partitioned Table. All data will reside in partition p1:
CREATE TABLE `my_table_partitioned` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`class` enum('a','b','c','d','e','f') NOT NULL,
`comment` varchar(30) NOT NULL DEFAULT 'Sometext',
PRIMARY KEY (`id`),
KEY `idx_b` (`class`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (id)
(PARTITION `start` VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (90000000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (160000000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (210000000) ENGINE = InnoDB,
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Inputting some data into table my_table:
for i in `seq 200`
do
(
echo "INSERT INTO my_table(class) VALUES "
for o in `seq 100000`
do
echo "("$((1 +( RANDOM % 5)))"),"
done
echo '(1);'
)
done | mysql my_database;
Replicating data from my_table into my_table_partitioned:
MariaDB [my_database]> INSERT INTO my_table_partitioned SELECT * FROM my_table;
Query OK, 20000200 rows affected (6 min 22.23 sec)
Records: 20000200 Duplicates: 0 Warnings: 0
Explaining COUNT(*) queries on both tables:
MariaDB [my_database]> EXPLAIN SELECT COUNT(*) FROM my_table;
+------+-------------+----------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | my_table | index | NULL | idx_b | 1 | NULL | 19460953 | Using index |
+------+-------------+----------+-------+---------------+-------+---------+------+----------+-------------+
1 row in set (0.00 sec)
MariaDB [my_database]> EXPLAIN SELECT COUNT(*) FROM my_table_partitioned;
+------+-------------+----------------------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | my_table_partitioned | index | NULL | idx_b | 1 | NULL | 19463673 | Using index |
+------+-------------+----------------------+-------+---------------+-------+---------+------+----------+-------------+
1 row in set (0.00 sec)
Running a number of COUNT(*) on both tables:
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table;
+----------+
| count(*) |
+----------+
| 20000200 |
+----------+
1 row in set (4.53 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table;
+----------+
| count(*) |
+----------+
| 20000200 |
+----------+
1 row in set (4.41 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table;
+----------+
| count(*) |
+----------+
| 20000200 |
+----------+
1 row in set (4.30 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned;
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (6.97 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned;
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (6.91 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned;
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (6.92 sec)
Similar timings are achieved if the partition is specified explicitly:
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned PARTITION(p1);
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (7.85 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned PARTITION(p1);
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (6.69 sec)
Below is the profile of the query on the partitioned table - but to me it does not explain where the slowdown is coming from.
MariaDB [my_database]> SET PROFILING=1;
Query OK, 0 rows affected (0.00 sec)
MariaDB [my_database]> SELECT SQL_NO_CACHE COUNT(*) FROM my_table_partitioned PARTITION(p1);
+----------+
| COUNT(*) |
+----------+
| 20000200 |
+----------+
1 row in set (7.39 sec)
MariaDB [my_database]> SHOW PROFILE FOR QUERY 1;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000196 |
| checking permissions | 0.000029 |
| Opening tables | 0.000174 |
| After opening tables | 0.000057 |
| System lock | 0.000032 |
| Table lock | 0.000025 |
| init | 0.000051 |
| optimizing | 0.000030 |
| statistics | 0.000096 |
| preparing | 0.000077 |
| executing | 0.000037 |
| Sending data | 7.388734 |
| end | 0.000032 |
| query end | 0.000018 |
| closing tables | 0.000010 |
| Unlocking tables | 0.000039 |
| freeing items | 0.000013 |
| updating status | 0.000035 |
| cleaning up | 0.000004 |
+----------------------+----------+
19 rows in set (0.00 sec)
The performance degradation here is not as bad as in my real-world scenario. However I think I am reproducing the same problem but on a smaller scale. Is this a problem with the way the tests are being executed, with MySql itself or with my assumptions in the first place?
Thanks a lot.
--
Darren Demicoli
Devops Engineer
http://www.xcaliber.com