Re: partitioning by highly skewed data
Hi,
Thank you for the response.
Mikael Ronstrom wrote:
>Subpartitioning is possible for RANGE and LIST partitions, the
>subpartitioning is then HASH partitioning.
It would be very nice if we can subpartition HASH partitions,
or do somthing like:
: create table t1 (high_skew_col int, some_low_skew_col int,...)
: partitioned by range(hashfunc(high_skew_col) mod n) ...
: subpartitioned by hash(hashfunc(some_low_skew_col))...;
and distribute subpartitions among several disks.
Perhaps, it will give us maximum performance, if future version of
MySQL can do parallel scanning of partitions ;-)
In this example, I expect that we can still enjoy the benefit of
(hash)partitioning on queries like 'select * from t1 where high_skew_col = X',
while we can distribute data more evenly among disks.
RANGE partitioning by high_skew_col + subpartitioning is not very
suitable to our situation, because our high_skew_col is actually
sparse and unevenly distributed id numbers.
When I wrote my previous post, my idea was to define
a custom hash function like:
: fn(val1, val2) mod n = hash(val2) mod k (if val1 is null)
: = hash(val1) mod (n -k) + k (if val1 is not null)
and use 'partitioned by hash(high_skew_col, some_low_skew_col)'.
But, of course, it's not very useful if we rarely use some_low_skew_col
in where clause. I must have been sleeping...
Hiroshi