MySQL Forums
Forum List  »  Partitioning

Re: No other way than delete primary key to create partitions by single datetime columns?
Posted by: Rick James
Date: August 26, 2015 10:02AM

SELECT * FROM tbl WHERE indexed_column = 1234;
works almost as fast on a billion-row table as on a 10-row table.
This because of the way BTrees work and that you have an index on `indexed_column`.

Without the index, the query has scan the entire table, hence gets very slow as the table gets very big.

That's the simplest example of how an INDEX provides performance. For more complicated queries, the techniques may get more complex for maintaining performance as the table grows.

This forum, and moreso the companion "Performance" forum, is littered with use cases where I (and others) explain to Newbies (and Experts) how to transform their slow queries into fast ones. The main techniques are

1. The 'best' index -- often a 'composite' index. (More: and )
2. Reformulate the query
3. Redesign the schema (avoid EAV, etc)
4. (rarely) using PARITTIONs
5. (rarely) tune the VARIABLEs
6. (rarely) better hardware

#5, and #6 are effectively 1-time fixes. So, I like to find other ways to speed up a site.

"Splitting a table" is essentially equivalent to PARTITIONing, but it puts the burden of picking the table on the user. I don't know of a case for "horizontally" splitting into multiple tables. (There are some use cases for "vertical" splitting -- moving some columns into a 'parallel' table.)

There are two non-BTree index types:

FULLTEXT index -- This automatically breaks a TEXT field into "words" and indexes them.

SPATIAL index -- this uses RTrees for 2D stuff, such as testing whether a latitude+longitude is inside a given geographical region. (You need to describe the `region` via a polygon.)

But virtually all MySQL searching/sorting/etc is based on B+Trees, which allow for very efficient (at all table sizes):
* Find a row, given an indexed column(s)
* Scan consecutive (according to the index) records

When you have huge tables (too big to cache in RAM), the biggest consideration is I/O. Hence the indexing and query writing focus on how to minimize the number of _different_ blocks that need to be read from disk.

Back to your use case,
INDEX(col3, createDate) is optimal for
WHERE col3 = 123 AND createDate BETWEEN ... AND ...
regardless of the size of the table.

But, things can be tricky...
WHERE col3 > 123 AND createDate BETWEEN ... AND ...
(note the '>') is a very different use case for the optimizer. It is two "ranges". In this case, the previous composite index cannot be used. Instead two indexes:
gives the optimizer a choice; it will pick the one that is probably faster. The query will be less efficient. And it will probably be sensitive to the table size if the table is bigger than cache. But now we are into an area for which there is no generic solution. (Digging deeper into the application may lead to a trick that works well.)

You may notice that several of my blogs address problems that do not directly work well with a BTree. They (generally) require reformulating the user's code to find tricky ways to still make use of B+Trees, clustering of the PRIMARY KEY, etc.

Options: ReplyQuote

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.