MySQL Forums
Forum List  »  Partitioning

Re: Fortnight partitions
Posted by: Rick James
Date: November 20, 2012 10:56PM

> Data older than 6 months should be deleted/truncated. Would partitioning be a good solution

Yes. That is a significant use for PARTITIONs.

> partitioning on what column(s) would be useful?

PARTITION BY RANGE (the key that you use for purging). This leads to a puzzle. Nothing in your table smells like a field for determining "data older than 6 months". More later...

I would partition by week; hence about 28 partitions.

> innodb_buffer_pool_size 1073741824

How much RAM do you have? See: http://mysql.rjweb.org/doc.php/memory

> innodb_flush_log_at_trx_commit 0

2 is probably always better than 0.

> innodb_file_per_table OFF

I prefer ON.

> high freq updates/inserts into the above table around 2500 per minute.

Not very stressful.

> batch INSERTs of 1000

Good. Are there likely to be a lot of duplicates in that 1000? If so, it would possible be better to insert them into a temp table, then do INSERT ... ( SELECT ... GROUP BY ... ) ON DUPLICATE KEY ...

Hints on partition maintenance: http://mysql.rjweb.org/doc.php/partitionmaint

The table will become over 100GB? Random lookups by username or userid will be costly because it will hit the disk an average of once per partition. (Hmmmm... Maybe weekly is a bad idea.) How often are the SELECTs?

SELECT performance will not improve unless you have a the PARTITION key in the WHERE clause.

Back to purging and the "PARTITION key"... If you want to purge based on `lastupd` and that field is updated frequently, then that would be a _bad_ field for PARTITIONong. When it changes from one week (or fortnight) to another, the row mus be deleted from one partition (think of it as a table) and inserted into another. Costly.

So... I would rethink the entire design... Let's focus on a day (or week or fortnight). Let's gather the stats for that unit of time, then stop updating it when we move on to the next unit of time. This will lead to more complex INSERTs and UPDATEs and SELECTs, but we can achieve the purge-via-PARTITIONing.

SELECT becomes
SELECT MAX(lastupd), SUM(friends_cnt), SUM(messages_cnt), ...
There needs to be a new day/week/fortnight field, and the INSERT/UPDATE only uses that field. Perhaps this is instead of lastupd? (DATE is 3 bytes; BIGINT is 8.)

This design will cause a semantic change -- In the past, you counted the cnts until the latest one is more than 6 months old. My suggestion only counts the friends/messages in the last 6 months, and loses older data.

Ponder my thoughts and come back with a modified design. Then we can refine it further.

Options: ReplyQuote


Subject
Views
Written By
Posted
4175
November 19, 2012 11:24PM
Re: Fortnight partitions
2454
November 20, 2012 10:56PM
1658
November 20, 2012 11:23PM
1767
November 21, 2012 08:54PM
1709
November 23, 2012 12:04AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.