MySQL Forums
Forum List  »  Partitioning

Re: Large Table Architecture
Posted by: Rick James
Date: May 25, 2011 11:31PM

Caching...
* You are using MyISAM.
* 99GB of RAM
* key_buffer_size -- default of 8M is much to small
* You have a recent version (older ones limited to 4G for key_buffer)
Therefore, set key_buffer_size = 20G.

If this is the only "reporting" query you have:
SELECT COUNT('x'), COUNT(DISTINCT OPID), `Action`, Campaign 
FROM Mydb.mytablename
GROUP BY Campaign, `Action`;
then the answer is easy. Build a Summary table with something like...
Well, there are two pieces missing from the equation.
* What indexes on main table? (awaiting SHOW CREATE TABLE)
* How much doe OPID vary? (I will assume it is not very unique)

CREATE TABLE CAOSummary (
    dy DATE NOT NULL,
    Campaign ...,
    Action ...
    OPID ...
    ct MEDIUMINT UNSIGNED NOT NULL,
    PRIMARY KEY (dy, Campaign, Action, OPID)
)

Every night, after all the day's data has been loaded, run a giant
INSERT INTO CAOSummary
    SELECT  LEFT(Timestamp, 7) as dy,
            Campaign, Action, OPID,
            COUNT(*) as ct
        FROM RESP_CA_MasterSent_split
        WHERE Timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)
          AND Timestamp <  CURRENT_DATE 
        GROUP BY dy, Campaign, Action, OPID;
Your reporting SELECT will read FROM CAOSummary.

Meanwhile, the only index on RESP_CA_MasterSent_split will
INDEX(Timestamp)
The current INDEX is causing a slowdown because
* It is rather random, and
* key_buffer was _tiny_, hence virtually no caching.

The INDEX(Timestamp) will need essentially no caching because you are inserting rows in (virtually) that order.

The Summary table will, on the other hand, be jumping around to set its index. But it should be a much smaller table, hence more cacheable, etc.

With 99GB of RAM, my suggestions of normalizing Campaign, etc, won't make a lot of difference. But at your current growth rate, it will become important. (And it will be faster to 'fix' the situation now, rather than later.)

Potential problems:
* Other reports you have not yet mentioned.
* Need to reload yesterday's data
* Failure to run the summarization script some night
* System crash -- will take a long time to REPAIR such a big table. (Again Normalizing will help.)
* OPID being not very unique. (This would lead to the Summary table not being much smaller than the 'Fact' table.)

How do you do the INSERTs? Batched? LOAD DATA? Via another table? Does the data come in all at once, or trickle in throughout the day?

Options: ReplyQuote


Subject
Views
Written By
Posted
6108
May 24, 2011 12:55PM
3243
May 25, 2011 07:45PM
2088
May 25, 2011 09:33PM
Re: Large Table Architecture
2330
May 25, 2011 11:31PM
1933
May 26, 2011 09:49AM
2357
May 26, 2011 10:52AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.