As of right now, that reporting query is the only one. They haven't decided yet on how else to use the data. They will probably use to build other reports that have yet to be determined.
I did change the key buffer to 20GB per your suggestion.
How do you do the INSERTs? Batched? LOAD DATA? Via another table? Does the data come in all at once, or trickle in throughout the day?
* It is currently loaded every morning via LOAD DATA script in our ETL tool/grabs a file every morning. Normalizing should be pretty easy to setup with look up tables for the static content (ACTIONS)
SHOW CREATE TABLE:
CREATE TABLE `RESP_CA_MasterSent_List` (
`Email` varchar(125) COLLATE latin1_general_ci DEFAULT NULL,
`Campaign` varchar(75) COLLATE latin1_general_ci DEFAULT NULL,
`Action` varchar(15) COLLATE latin1_general_ci DEFAULT NULL,
`Description` varchar(75) COLLATE latin1_general_ci DEFAULT NULL,
`Timestamp` datetime DEFAULT NULL,
`LaunchID` int(10) DEFAULT NULL,
`OPID` varchar(15) COLLATE latin1_general_ci DEFAULT NULL,
`InFile` varchar(25) COLLATE latin1_general_ci DEFAULT NULL,
KEY `IDX_MasterSent_CampAction` (`Campaign`,`Action`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
This is currently the table the query mentioned runs against, I will be breaking this down to a normalized table as well.
The OPID values vary quite a bit, but to repeat. For instance
The three values for the ACTION column are:
So one person (OPID), will exist in 1 (SENT) but could also have 1 or both of the other values per campaign.
I have one table loaded with all the data (table mentioned above), this allows me to experiment with various techniques in parallel until I get it right. From start to finish I can run the load process in about 8 hrs, probably faster if I switch to bulk loading.
I like your idea of utilizing the timestamp field as an index so next, I will implement the normalization and summary table process as described.