MySQL Forums
Forum List  »  Newbie

Re: Partitioning or Not Partitioning?
Posted by: Rick James
Date: November 24, 2014 07:21PM

The main issue in a high INSERT rate is the amount of I/O that will be done. Hence,
* Shrink record size
* Minimize Indexing work
* Minimize the number of rows

Shrinking record size involves
* Replacing long, repeated, strings with short ids ("normalization")
* Shrinking datatypes (INT -> MEDIUMINT, etc)
* Minimizing number of indexes

Minimizing indexing work
* Each row inserted requires that _all_ indexes be updated to reflect the existence of the new row. At worst, this is more than one I/O (per row inserted). At best, the cost is amortized over multiple INSERTs, and is close to zero, but non-zero. (Wayne, I hope this paragraph answers your question. I will be happy to elaborate, preferably with a specific use case.)

Minimizing the number of rows --
* That's where I brought up 1 wide row instead of 25 narrow rows. This has the side effect of decreasing I/O because some column(s) don't need to be duplicated 25 times. Also, each row has some overhead, so that is decreased.

Now to do some computations (the numbers are guesses or estimates):
IOPs: 100/sec (How fast a typical system can read/write disk.)
InnoDB block size: 16,384 bytes (exact).
Hence, 100 InnoDB blocks (1.6MB) can be written per second.
Assume no secondary indexes.

Average row size for existing table: 100 bytes.
--> Inserting 16K rows/second will saturate the I/O system.
--> Max of 650 sets of metrics / second (after dividing by 25)
After removing nomevariabile: 90 bytes/row.
--> 17K rows/sec
--> 710 sets/sec
After using 25 FLOAT values in columns (instead of 25 rows): 250 bytes/row.
--> 6.5K rows per second inserted
--> 6500 sets/sec

That is, you are limited to 650/710/6500 _sets_ per second because of writing `realtime`.

Note that the "History" table has only 1/60th the I/O.

Note that I did not mention splitting things into separate tables or partitions. That is mostly irrelevant when counting disk I/Os. So, if I understand the requirements correctly, and assuming the need to persist `realtime`, the only way to achieve the desired write speed is having all 25 values in a single record.

Also note that I am not concerned with the total size of the table(s) on disk. Instead, I am concerned with number of blocks that need to be written "per second".

To make the SELECTs more efficient, you may need to one or more indexes. This could severely cut back on the performance, depending on how 'random' the indexed field(s) is.

Let me digress here. I want to know how long you keep the "realtime" table. That is, how big will that table be before it is purged? In particular, if it does not need to be too big, maybe we can keep it in RAM, thereby never incurring any I/O.

I suggested aggregating the realtime data into `history` immediately (that is, at the end of each minute). I still recommend that, and a reason will show up in a minute.

IF `realtime` can be small enough to fit in RAM in a MEMORY table, then my computations above (showing that you are limited to 6500 sets/sec) are no longer useful. Instead there is no I/O for realtime, and 1/60th the I/O for history. In particular, you could handle 60*6500 "sets".

But... MEMORY is volatile, so the data is lost in a system crash. However, if you 'roll up' the data into `history` promptly, you lose at most a minute's worth of data.

Again, I have not mentioned splitting into multiple tables or partitions, because it does not impact the computations. (Actually, keeping 6000 tables open may have a _negative_ impact on performance!)

Something to consider in `history`. As with stock prices, is it worth computing and storing MIN, MAX, 'open', and 'close', in addition to AVG? ('open' and 'close' are the first and last values for temperature or whatever.)

Back to an issue that I have not yet addressed. What to do when a 26th metric is added? Adding a column to a large table (ALTER TABLE history ADD COLUMN ...) is a costly operation unless you have a recent version of MySQL. What version are you running?

But there is another thing to consider... Perhaps the metrics can be grouped in some logical, application-specific, way. Example: Environment (temp, dew point, ...); Electrical (voltage, amperage, ...); etc. Then split `history` (and `realtime`?) into sub-tables along those lines. (This split is called "Vertical partitioning"; "PARTITION" is "Horizontal" partitioning.)

There are advantages and disadvantages in doing that sort of clumping:
* The INSERT cost will go up slightly. (But this may not be a serious problem once you get rid of I/O for `realtime`.)
* The SELECTs may be more logical -- if you can find all the data in a single table. Or...
* The SELECTs may be messier -- if you need to JOIN to get columns from multiple tables.
* Adding a 26th column means ALTERing one of the sub-tables. This is slightly cheaper than ALTERing the full-sized `history`. Or...
* Adding a 26th column might mean adding a new sub-table (near zero cost).
* Having MIN/MAX/open/close makes for a bulky un-split table versus a not-so-bulky set of sub-tables. (No, 5 columns per metric, times 25 metrics of 4 bytes each will not hit any database limits.)

In any case, we need to discuss the SELECTs:
* INDEXing
* PARTITIONing
* Vertical partitioning
* Potential conflicts with the INSERTing
* and other performance issues


I wanted this comment to be focused on "counting the IOPs", getting rid of `realtime` I/O, and a couple of related issues. In doing so, I may have failed to answer some your your comments. Manuel, feel free to ask the remaining questions again. Also, I introduced some 'new' topics (eg, MEDIUMINT); feel free to ask for more details.

Options: ReplyQuote


Subject
Written By
Posted
Re: Partitioning or Not Partitioning?
November 24, 2014 07:21PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.