Re: Advice on partitioning
Hi Rick
These are the tables I created for my test case
---------------------------------------------------------------------------
CREATE TABLE mail_test_key (
id bigint AUTO_INCREMENT ,
ownerid int unsigned ,
creation_date datetime(6) default now(6),
subject varchar(500),
primary key (id,ownerid)
)
ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY KEY( ownerid )
PARTITIONS 100;
CREATE TABLE mail_test_linear_key (
id bigint AUTO_INCREMENT ,
ownerid int unsigned ,
creation_date datetime(6) default now(6),
subject varchar(500),
primary key (id,ownerid)
)
ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY LINEAR KEY( ownerid )
PARTITIONS 100;
CREATE TABLE mail_test_hash (
id bigint AUTO_INCREMENT ,
ownerid int unsigned ,
creation_date datetime(6) default now(6),
subject varchar(500),
primary key (id,ownerid)
)
ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY hash( ownerid )
PARTITIONS 100;
CREATE TABLE mail_test_linear_hash (
id bigint AUTO_INCREMENT ,
ownerid int unsigned ,
creation_date datetime(6) default now(6),
subject varchar(500),
primary key (id,ownerid)
)
ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY LINEAR hash( ownerid )
PARTITIONS 100;
---------------------------------------------------------------------------
Obviously this is not the ultimate table that we will have. Our table will most likely have some more columns as
`creationDate` datetime(6) NULL DEFAULT NULL default now(6),
`CreatedById` int unsigned NOT NULL,
`LastModifiedDate` datetime(6) NOT NULL default now(6),
`LastModifiedById` int unsigned DEFAULT NULL,
`Deleted` int unsigned NOT NULL DEFAULT '0',
`senddate` datetime(6) NULL,
`ParentID` int unsigned NULL ,
`Path` varchar(255) NOT NULL DEFAULT '/',
`from` varchar(255) DEFAULT NULL,
`replyto` varchar(255) DEFAULT NULL,
`sender` varchar(255) DEFAULT NULL,
`subject` varchar(255) DEFAULT NULL,
`cc` varchar(255) DEFAULT NULL,
`bcc` varchar(255) DEFAULT NULL,
`to` varchar(255) DEFAULT NULL,
`folder` varchar(75) DEFAULT '',
`size` int unsigned DEFAULT '0',
`Draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
`SavedSend` tinyint(3) unsigned NOT NULL DEFAULT '0',
`ExpirationDate` datetime(6) NOT NULL ,
`flagged` tinyint(1) NOT NULL DEFAULT '0',
`repliedto` tinyint(1) NOT NULL DEFAULT '0',
`forwarded` tinyint(1) NOT NULL DEFAULT '0',
`isspam` int unsigned NOT NULL DEFAULT '0',
`HasAttachments` tinyint(1) DEFAULT NULL,
And some indexes on creationDate,LastModifiedDate like
KEY `modified_idx` (`ownerId`,`parentID`,`LastModifiedDate`)
to allow user to sort their e-mails based modification/arrival date. Most of our queries will have the "ownerId" filter to make sure user can't modify/view other user's emails. Each user will mainly receive a lot of emails and they will constantly mark them as read/unread or move them between folders (in this case just and update on "parentId" column) or deleting them (which is an update on `Deleted` column)
This table may have a lot of records (on average 200,000 to 300,000 records per user although we may have users with 5,000,000 emails). We should be able to support databases with millions of users and scale on user basis.
For now we chose the partition to be based on id,ownerId columns and we will also have shards based on user id which distribute users across databases.
If you need more information please let me know , I can also post my java test case here if you think it will help.
Thanks for your help