MySQL Forums
Forum List  »  Performance

Statistics bottleneck on large table
Posted by: Nick Lott
Date: August 21, 2010 01:15PM

Hi there,
I'm having a issue where a simple query on a large table is spending a relatively large amount of time in the "statistics" stage in the profiler.

The table is:

mysql> describe test;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| rt | datetime | NO | | NULL | |
| vt | datetime | NO | PRI | NULL | |
| lat | decimal(6,3) | NO | PRI | NULL | |
| lon | decimal(6,3) | NO | PRI | NULL | |
| data1 | decimal(5,0) | YES | | NULL | |
| data2 | decimal(5,0) | YES | | NULL | |
| data3 | decimal(5,0) | YES | | NULL | |
| data4 | decimal(5,0) | YES | | NULL | |
and a lot of very similar columns
+-------------+--------------+------+-----+---------+-------+

With one multi-column key on three fields:

mysql> show keys from test;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| test | 0 | PRIMARY | 1 | vt | A | 50 | NULL | NULL | | BTREE | |
| test | 0 | PRIMARY | 2 | lat | A | 18050 | NULL | NULL | | BTREE | |
| test | 0 | PRIMARY | 3 | lon | A | 12996000 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

This table gets truncated and refilled every day with about 20 million rows in a 7Gb MYD file (this example only has 13m rows).

Only one query is ever run on this table (with different values):

SELECT * FROM `test` WHERE lat = 10 AND lon = 64 AND vt = '2010-08-23 00:00:00';

Which only ever returns a maximum of 3 rows (and normally only one). The explain is as follows:

mysql> explain SELECT * FROM `test` WHERE lat = 10 AND lon = 64 AND vt = '2010-08-23 00:00:00';
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | test | const | PRIMARY | PRIMARY | 16 | const,const,const | 1 | |
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
1 row in set (0.00 sec)

All fine so far. However, because doing a range request on the datetime field is far too slow I have to do multiple requests like this to get several rows of data, and the cumulative query time is too slow (the query cache works great but the same row is very rarely requested twice as there is application caching at a higher level). An example profile is as follows:

mysql> show profile for query 1;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000019 |
| checking query cache for query | 0.000060 |
| Opening tables | 0.000010 |
| System lock | 0.000003 |
| Table lock | 0.000025 |
| init | 0.000086 |
| optimizing | 0.000017 |
| statistics | 0.016362 |
| preparing | 0.000020 |
| executing | 0.000003 |
| Sending data | 0.000175 |
| end | 0.000004 |
| query end | 0.000003 |
| freeing items | 0.000024 |
| storing result in query cache | 0.000006 |
| logging slow query | 0.000001 |
| cleaning up | 0.000002 |
+--------------------------------+----------+
17 rows in set (0.00 sec)

As you can see it seems to be spending a lot of time in "statistics". It's hugely variable, ranging from 0.001 to 0.5 seconds, but tending towards the higher end.

I've tried numerous things to get solve this bottleneck: upgrading to 5.1; increasing various tmp table and memory related parameters; converting the table to innodb (no difference, slower to insert into); I've tried replacing the single key with three separate ones (slower execution). None of these things seem to have made the slightest bit of difference. "Analyze table" seems also to make no difference.

I can find very little documentation about what "statistics" actually is doing but looking in the source I can see it's to do with optimising joins and query planning. I find it a little strange that this is even happening given that there are no joins and only one available key, so it actually also being the bottleneck is frustrating.

In "profile all" I notice that "statistics" is often doing some voluntary context switching (while mysql is not busy this is not an unloaded box). This perhaps explains the variability of the timing but I'm not sure why it would always be the statistics stage, why it's hanging around long enough to need to switch or why it's even bothering here?

One solution would be to disable the optimizer (assuming that's what it is) but I can't find a way to do that. The only thing that looks close is "FORCE INDEX" and that has no effect. I see there is the optimizer_switch in 5.1 but that doesn't appear to be relevant.

Does anyone have any suggestions on where to look next? Splitting the table up had crossed my mind, but on further investigation the same problem seems to occur with only about 1m rows and given the application architecture it's something I'd like to avoid.

Thanks,
nick



Edited 1 time(s). Last edit at 08/21/2010 01:19PM by Nick Lott.

Options: ReplyQuote


Subject
Views
Written By
Posted
Statistics bottleneck on large table
11780
August 21, 2010 01:15PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.