MySQL Forums
Forum List  »  Optimizer & Parser

Re: Help with partially unique index
Posted by: El T
Date: July 02, 2016 03:33PM

Hi Rick,

Firstly, thanks a lot for the reply - you obviously understand what I'm getting at and these are the sorts of answers I need.

In answer to your questions, I understand the principles of a covering index and the benefits of that are what I'm trying to achieve here. Probably my biggest concern is about the performance and size impact of adding the secondary index. Can you explain what you mean by the latter index "being implemented like (d,e,a,b,c)" ? It sounds like its folded into the primary index in some manner which is exactly what I want. The reason for my questions is because "d" and "e" will never be queried except under the preceding limiter of the unique index. There's never, ever a query like "give me all the rows where d = blah". It is always queries of "where a = blah and b = yada give me those rows where d = whatever". So I can already eliminate _almost_ all but the records I want in any given query _. I just want to be able to get the benefits of a covering index on those last two columns as well. And it sounds like from what you're saying that InnoDB is smart enough to do that without building an entirely separate parallel index of just d and e using up a lot of space and - more importantly - adding additional maintenance overhead keeping it up to date. I wasn't sure whether I should just:
1. Not have an index on d and e and let it read the actual data
2. Have an index on d and e to get the covering benefits
3. Have my secondary index be on a,b,c,d and e and innodb will magically recognize the overlap between that and the primary one and do something clever about it.

It sounds like option 2 is the correct one from what you're saying. But when I do this (see revised table definition further down) the size of the indexes given by SHOW INDEXES jumps by around 50% which is a big increase (or seems it to me).

But onto the wrinkles. I do want to partition the table. It is heavily read AND written to and I see partitioning as a tool to help maintain performance (correct me if I'm wrong). All queries will be limited by account_id always and without exception, so I feel safe in partitioning by that. The table definition thus ends up like the below (I have added in the non-unique index as suggested):

CREATE TABLE `user_category_info` (
`account_id` bigint(20) unsigned NOT NULL,
`user_id` bigint(20) unsigned NOT NULL,
`category_id` smallint(5) unsigned NOT NULL,
`status` tinyint(3) unsigned NOT NULL,
`score` int(10) unsigned NOT NULL,
`somevalueA` int(10) unsigned DEFAULT NULL,
`somevalueB` int(10) unsigned DEFAULT NULL,
`somevalueC` int(10) unsigned DEFAULT NULL,
`somevalueD` int(10) unsigned DEFAULT NULL,
`somevalueE` int(10) unsigned DEFAULT NULL,
`somevalueF` int(10) unsigned DEFAULT NULL,
`somevalueG` int(10) unsigned DEFAULT NULL,
UNIQUE KEY `udx_acc_user_cat_stat_comp` (`account_id`,`user_id`,`category_id`),
KEY `idx_status_score` (`status`,`score`)
PARTITION BY KEY (account_id)
PARTITIONS 2048; //There will be something like 2,000 to 2,500 unique account_ids in the system so I'm envisioning a partition for each account. I expect something like 50 to 60 million rows in the above as an upper bound.

You were right to pick up on the account_id being NULL. I do have some NULLs in the existing data set for various reasons but I can get rid of those (legacy data that can be archived away) to allow account_id to be not NULL and I've made that change. However, does adding the partitioning mess up any of what we've discussed so far?

Posting actual queries would involved adding a lot more table definitions and explanation because of the number of joins in most of the queries, however if it's okay to reduce it to a version that is just a WHERE clause equivalent you'd see a lot of queries like the following, in very large volume:

SELECT COUNT(*) FROM user_category_info WHERE account_id = 1234 AND user_id = 5678 AND status = 1 AND score > 0;
SELECT user_id, COUNT(*) FROM user_category_info WHERE account_id = 1234 AND status = 1 AND score > 0 GROUP BY user_id;
SELECT category_id, score FROM user_category_info WHERE account_id = 1234 AND user_id = 5678 AND status > 0;
UPDATE user_category_info SET someValueC = UNIX_TIMESTAMP(), score = 99 WHERE account_id = 1234 AND user_id 5678 AND category_id = 4l;

//In practice some of these would have a bunch of joins to get information associated with the user and the category but I think this conveys the parts relevant to this table and its indexing.

There are more complicated ones soemtimes that need to query the other columns, but 90% of the reads are answerable just from the values of account_id, user_id, category_id, status and score. Which is why I'm so keen to get them in the indexes.

Sorry if these questions are a little vague. I'm not familiar enough with the subject matter to know exactly what I should ask. But in essence, I want to know if I'm indexing this in the best possible way and if I should be concerned about the impact of adding the secondary non-unique index and how InnoDB will manage it internally.

I really appreciate you having taken the time to answer this. I have done quite a bit of reading on this, but haven't quite got the level of answers I want.

Options: ReplyQuote

Written By
June 22, 2016 04:44AM
Re: Help with partially unique index
July 02, 2016 03:33PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.