MySQL :: Statistics bottleneck on large table

New Topic

Statistics bottleneck on large table

Posted by: Nick Lott
Date: August 21, 2010 01:15PM

Hi there,
I'm having a issue where a simple query on a large table is spending a relatively large amount of time in the "statistics" stage in the profiler.

The table is:

mysql> describe test;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| rt | datetime | NO | | NULL | |
| vt | datetime | NO | PRI | NULL | |
| lat | decimal(6,3) | NO | PRI | NULL | |
| lon | decimal(6,3) | NO | PRI | NULL | |
| data1 | decimal(5,0) | YES | | NULL | |
| data2 | decimal(5,0) | YES | | NULL | |
| data3 | decimal(5,0) | YES | | NULL | |
| data4 | decimal(5,0) | YES | | NULL | |
and a lot of very similar columns
+-------------+--------------+------+-----+---------+-------+

With one multi-column key on three fields:

mysql> show keys from test;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| test | 0 | PRIMARY | 1 | vt | A | 50 | NULL | NULL | | BTREE | |
| test | 0 | PRIMARY | 2 | lat | A | 18050 | NULL | NULL | | BTREE | |
| test | 0 | PRIMARY | 3 | lon | A | 12996000 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

This table gets truncated and refilled every day with about 20 million rows in a 7Gb MYD file (this example only has 13m rows).

Only one query is ever run on this table (with different values):

SELECT * FROM `test` WHERE lat = 10 AND lon = 64 AND vt = '2010-08-23 00:00:00';

Which only ever returns a maximum of 3 rows (and normally only one). The explain is as follows:

mysql> explain SELECT * FROM `test` WHERE lat = 10 AND lon = 64 AND vt = '2010-08-23 00:00:00';
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | test | const | PRIMARY | PRIMARY | 16 | const,const,const | 1 | |
+----+-------------+-------+-------+---------------+---------+---------+-------------------+------+-------+
1 row in set (0.00 sec)

All fine so far. However, because doing a range request on the datetime field is far too slow I have to do multiple requests like this to get several rows of data, and the cumulative query time is too slow (the query cache works great but the same row is very rarely requested twice as there is application caching at a higher level). An example profile is as follows:

mysql> show profile for query 1;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000019 |
| checking query cache for query | 0.000060 |
| Opening tables | 0.000010 |
| System lock | 0.000003 |
| Table lock | 0.000025 |
| init | 0.000086 |
| optimizing | 0.000017 |
| statistics | 0.016362 |
| preparing | 0.000020 |
| executing | 0.000003 |
| Sending data | 0.000175 |
| end | 0.000004 |
| query end | 0.000003 |
| freeing items | 0.000024 |
| storing result in query cache | 0.000006 |
| logging slow query | 0.000001 |
| cleaning up | 0.000002 |
+--------------------------------+----------+
17 rows in set (0.00 sec)

As you can see it seems to be spending a lot of time in "statistics". It's hugely variable, ranging from 0.001 to 0.5 seconds, but tending towards the higher end.

I've tried numerous things to get solve this bottleneck: upgrading to 5.1; increasing various tmp table and memory related parameters; converting the table to innodb (no difference, slower to insert into); I've tried replacing the single key with three separate ones (slower execution). None of these things seem to have made the slightest bit of difference. "Analyze table" seems also to make no difference.

I can find very little documentation about what "statistics" actually is doing but looking in the source I can see it's to do with optimising joins and query planning. I find it a little strange that this is even happening given that there are no joins and only one available key, so it actually also being the bottleneck is frustrating.

In "profile all" I notice that "statistics" is often doing some voluntary context switching (while mysql is not busy this is not an unloaded box). This perhaps explains the variability of the timing but I'm not sure why it would always be the statistics stage, why it's hanging around long enough to need to switch or why it's even bothering here?

One solution would be to disable the optimizer (assuming that's what it is) but I can't find a way to do that. The only thing that looks close is "FORCE INDEX" and that has no effect. I see there is the optimizer_switch in 5.1 but that doesn't appear to be relevant.

Does anyone have any suggestions on where to look next? Splitting the table up had crossed my mind, but on further investigation the same problem seems to occur with only about 1m rows and given the application architecture it's something I'd like to avoid.

Thanks,
nick

Edited 1 time(s). Last edit at 08/21/2010 01:19PM by Nick Lott.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Statistics bottleneck on large table

11969

Nick Lott

August 21, 2010 01:15PM

Re: Statistics bottleneck on large table

4020

Rick James

August 22, 2010 04:45PM

Re: Statistics bottleneck on large table

3008

Nick Lott

August 23, 2010 06:31AM

Re: Statistics bottleneck on large table

2587

Nick Lott

August 23, 2010 07:20AM

Re: Statistics bottleneck on large table

2426

Rick James

August 23, 2010 09:20PM

Re: Statistics bottleneck on large table

2251

Nick Lott

August 24, 2010 02:14AM

Re: Statistics bottleneck on large table

2277

Rick James

August 25, 2010 12:12AM

Re: Statistics bottleneck on large table

2308

Nick Lott

August 25, 2010 02:36AM

Re: Statistics bottleneck on large table

2443

Rick James

August 25, 2010 08:54PM

Sorry, you can't reply to this topic. It has been closed.

This forum is currently read only. You can not log in or make any changes. This is a temporary situation.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.