MySQL Forums
Forum List  »  Newbie

Re: Slow queries with very large table
Posted by: Rick James
Date: January 08, 2014 11:58AM

> But given pk is the Primary Key and it's automatically indexed, shouldn't be equivalent?

Do EXPLAIN SELECT COUNT(...) ...
to see what it will do. COUNT(*) may be executed the same was as COUNT(pk), or it may use a secondary index.

In theory, COUNT(*) lets it pick the best.

> Another hypothetical explanation is that our database relies in a single physical file. We believe, that disk access will be improved if we split the database into one different file per table.

No. InnoDB is arranged in 16KB blocks, across one (ibdata1) or many (.ibd) disk files ('tablespaces'). A single connection will mostly read one block at a time. Multiple connections will each be reading blocks. The blocks will be seemingly randomly arranged on disk.

The best way to spread InnoDB across more than one physical disk is to use RAID striping. This gives some chance that the load will be evenly spread across the drives. Manually placing things (tables, partitions, data, index, etc) on different drives does a poor job because normal access patterns are biased toward thing, thereby leading to one drive being accessed more than the others.

If your system is I/O bound (and you have multiple physical drives), RAID will help some, but the real solution is to look at the queries and figure out how to rewrite them to avoid I/O. Such rewriting can sometimes give you an order of magnitude speedup.

If you are not I/O bound, then perhaps it is "premature optimization".

> It took 275 seconds, after that one, I retried and only 3 seconds.

275 -- probably cold cache and very I/O bound. 3 -- probably fully cached, and CPU-bound.

10 million rows is moderately large -- it is about the 90th percentile of the tables discussed on this forum.

Back to COUNT()...
* COUNT(*) vs COUNT(pk) -- maybe no difference; see EXPLAIN
* COUNT(pk) vs COUNT(non_pk) -- depends on what index is used and how big it is.
* PARTITION -- no speedup; possibly slowdown. (A future release may change this.)
* Moving things around on disk -- no speedup
* Multiple drives + RAID striping (-0,-5,-10) -- some speedup, depending on how good the readahead logic is. Speedup factor not more than the number of drives.
* Shrinking BIGINT to INT for all such fields -- up to 2x speedup because of less data to traverse. (I say "up to" because if it is cached, there may be no speedup.)

Those tactics tend to be orthogonal -- that is, each speedup is independent of the others.

For comparison, I did COUNT(*) on a table with these specs:
* Took a little over 4 seconds.
* 2.7M rows
* 160MB
* No RAID
* Windows laptop (slow drive)
* InnoDB
* 142 PARTITIONs (too many -- probably slowed it down). In a similar table, but without PARTITIONs, the query twice as fast.
* EXPLAIN showed that it used a secondary index -- about 93MB.
* I ignored the first time (not cached) and used the second timing.

> the innodb_log_buffer_size is 8388608, 8MB it seems to be.

Not a significant setting.

> I've learnt that we've already optimised the table (compacting it by restoring the database with a dump).

That was not likely to help much.

> Dropping unused indexes is one of our expected operations.

That will speed up INSERTs some.

> improved if we split the database into one different file per table.

The main benefit of innodb_file_per_table=1 is in maintenance -- the ability to recoup disk space after a DROP / ALTER / etc. Essentially no benefit (or harm) to performance.

Options: ReplyQuote


Subject
Written By
Posted
Re: Slow queries with very large table
January 08, 2014 11:58AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.