MySQL Forums
Forum List  »  Newbie

Re: Partitioning or Not Partitioning?
Posted by: Rick James
Date: November 29, 2014 03:42PM

More on the "staging" concept...

The moving of data from `Realtime` to `History` is perhaps the second worst bottleneck you are facing. (#1 is INSERTing into `Realtime`, but that is mostly 'solved'.)

I want to discuss:

INSERT INTO History (client_id, avg_temp, avg_volt, ..., timestamp)
SELECT client_id, AVG(temp), AVG(volt), ..., AVG(timestamp)
FROM `Realtime`
GROUP BY client_id;
followed immediately by some kind of DELETE or TRUNCATE.

* That pair of statements is run once a minute.
* It INSERTs 6000 rows into one table; this is not a performance problem.
* The SELECT part reads 60*6000 rows; this is something to be concerned about.
* And the DELETE/TRUNCATE is also a concern.

The SELECT probably takes more than a second. That is not bad for doing once per minute, but it is bad in that 6000 INSERTs are trying to write to the same table in that second. The exact details of this conflict depend on the ENGINE and a variety of other things.

If each INSERT is a separate connection, you might need max_connections to be bigger than 6000. This is _not_ good. It is _bad_.

If the INSERTs are delayed a second, then the data is getting stale. This is not good.

After being backed up for a second, how long will it take to "catch up"? Again, not good.

So, we really need to avoid that big SELECT on `Realtime`. But how? I already hinted at calling it a `Staging` table. Read
http://mysql.rjweb.org/doc.php/staging_table
and then I'll discuss what I am thinking...

INSERTs write to `Staging` (formerly called `Realtime`). A separate process wakes up every minute and does the following:
* CREATE TABLE new ...
* RENAME TABLE Staging TO StageProcess, new TO Staging;
* Do the INSERT..SELECT AVG... FROM StageProcess ...;
* DROP TABLE StageProcess; -- instead of DELETE or TRUNCATE

The only impact on `Staging` is the RENAME, which should be much faster than 1 second. It won't be "instantaneous", but it should be fast enough to block only a few INSERTs. The RENAME is atomic, so nothing will be 'lost' in the process.

This separate process can take as long as it needs to (up to one minute) to do the INSERT..SELECT, so that is no problem unless some other SELECT is reading lots of stuff from `History` and somehow blocking for more than a minute.

These steps cannot be done exactly once a minute, so sometimes there will be 61 rows for some clients or 59 rows, etc. Considering the type of data (averages of physical measurements), this lack of precision should not matter. Correct? Any attempt to make the handoff happen at exactly 60 seconds for all 6000 clients will add so much overhead as to threaten the performance.

I previously suggested having another table for `Current` values. The UPDATE into that table is an overhead that we _might_ be able to avoid. Here's a thought...
* Have an index on Staging so that it the UI can find the 'current' value.
* However, since there is a second during which Staging is empty, the UI will find no current value.
* One solution is to have the UI wait a second, then try again. If this is acceptable, then I would recommend it, since it has the least impact on the rest of Staging/StageProcess.

If this is the desired "current" query for the UI:

SELECT ... FROM Staging
WHERE client_id = ?
ORDER BY timestamp DESC
LIMIT 1;

then, INDEX(client_id, timestamp)
should be optimal.

Assuming Staging is MEMORY, then this needs to be before the CREATE TABLE:
SET max_heap_table_size = ...;
The value to use needs to be empirically determined, perhaps somewhere around 100000000.
Since that eats away at RAM, twice that needs to be subtracted innodb_buffer_pool_size.

Meanwhile, let's hope that this will suffice:
max_connections = 100

You really need to test this at full scale (6000 clients, etc) to validate everything I am proposing. You should even see if it will work at twice that scale -- to give you some 'head room' for contingencies and growth.

Options: ReplyQuote


Subject
Written By
Posted
Re: Partitioning or Not Partitioning?
November 29, 2014 03:42PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.