Skip navigation links
# MySQL Forums :: Partitioning :: How to handle secondary keys in partitioning?

Re: How to handle secondary keys in partitioning?

Posted by: **Mattias Jonsson** ()

Date: May 05, 2010 02:20AM

OK in this case it depends on how many partitions and how many matching rows there are for 'SELECT * FROM tp WHERE sender_id=1 ORDER BY date'.

Since it is partitioned by recipient_id pruning will not kick in.

Lets say you have N partitions and M matching rows (sender_id = 1).

This means that it has to do a index ordered read from each partition, and merge sort all N rows. And while there is still matching rows a new index ordered read would be done, until all matching rows are found.

So if (M < N) i.e. more partitions than matching rows, it will need to do more index lookups (and the sum of all steps in the b-tree index will be higher).

But if (M > N) i.e. more matching rows than partitions, the number of index reads will be the same (but the sum of the b-tree depth will be slightly higher, and there will be an additional sorting cost).

So there is an overhead of index search in partitioning, since the indexes are partitioned just like the data, but this overhead is higher per row if there are more partitions than matching rows and shrinks per row as the matching row number increases.

And the cost for sorting is also added in case of partitioning.

The type of performance degradation is hard to estimate without benchmarking, it depends on how many partitions and how many matching rows are found, as well as used engine and if a covering index can be used etc.

What I would suggest is that you benchmark your load on both partitioned and non partitioned tables. And if possible, add the results to this topic!

(Note that this is only index reads we discuss, the cost for index writes should be lower for partitioned tables, since the b-tree depth is lower per partition than it would be for a non partitioned table.)

If you usually query recipient_id and sender_id independently, I would suggest you to try to also subpartition your data, perhaps like:

PARTITION BY LIST (recipient_id % 5)

SUBPARTITION BY HASH (sender_id) PARTITIONS 5

(PARTITION pREC_0 VALUES IN (0),

PARTITION pREC_1 VALUES IN (1),

PARTITION pREC_2 VALUES IN (2),

PARTITION pREC_3 VALUES IN (3),

PARTITION pREC_4 VALUES IN (4))

That way you have partitioned on both recipient_id and sender_id equally, and have a total of 25 partitions. (you could also use RANGE instead of LIST, but I used LIST to get an easy way to equally distribute both columns.)

Since it is partitioned by recipient_id pruning will not kick in.

Lets say you have N partitions and M matching rows (sender_id = 1).

This means that it has to do a index ordered read from each partition, and merge sort all N rows. And while there is still matching rows a new index ordered read would be done, until all matching rows are found.

So if (M < N) i.e. more partitions than matching rows, it will need to do more index lookups (and the sum of all steps in the b-tree index will be higher).

But if (M > N) i.e. more matching rows than partitions, the number of index reads will be the same (but the sum of the b-tree depth will be slightly higher, and there will be an additional sorting cost).

So there is an overhead of index search in partitioning, since the indexes are partitioned just like the data, but this overhead is higher per row if there are more partitions than matching rows and shrinks per row as the matching row number increases.

And the cost for sorting is also added in case of partitioning.

The type of performance degradation is hard to estimate without benchmarking, it depends on how many partitions and how many matching rows are found, as well as used engine and if a covering index can be used etc.

What I would suggest is that you benchmark your load on both partitioned and non partitioned tables. And if possible, add the results to this topic!

(Note that this is only index reads we discuss, the cost for index writes should be lower for partitioned tables, since the b-tree depth is lower per partition than it would be for a non partitioned table.)

If you usually query recipient_id and sender_id independently, I would suggest you to try to also subpartition your data, perhaps like:

PARTITION BY LIST (recipient_id % 5)

SUBPARTITION BY HASH (sender_id) PARTITIONS 5

(PARTITION pREC_0 VALUES IN (0),

PARTITION pREC_1 VALUES IN (1),

PARTITION pREC_2 VALUES IN (2),

PARTITION pREC_3 VALUES IN (3),

PARTITION pREC_4 VALUES IN (4))

That way you have partitioned on both recipient_id and sender_id equally, and have a total of 25 partitions. (you could also use RANGE instead of LIST, but I used LIST to get an easy way to equally distribute both columns.)

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.