Re: New to design; how to optimize 3 column single table with high I/O?
Rick James Wrote:
-------------------------------------------------------
> > per user per event
> Eh? I don't see any column for "event"?
>
> Can you really guarantee that you won't insert two
> rows with the same millisecond? Of so, then
>
> CREATE TABLE foo (
> userid MEDIUMINT UNSIGNED NOT NULL,
> timestmp BIGINT UNSIGNED NOT NULL, -- or
> DOUBLE
> state ENUM('yes', 'no') NOT NULL,
> PRIMARY KEY(userid, timestmp) -- Here's your
> index.
> ) ENGINE=InnoDB;
>
> CREATE TABLE Users (
> userid MEDIUMINT UNSIGNED NOT NULL
> AUTO_INCREMENT,
> ..., -- info about the user
> PRIMARY KEY (userid)
> ) ENGINE=InnoDB;
>
>
> Any query that includes WHERE userid=... will be
> quite efficient. In particular, for
> > For User X at Time X, Yes or No?
>
> SELECT state
> FROM foo
> WHERE userid = ...
> AND timestmp <= ...
> ORDER BY timestmp DESC
> LIMIT 1;
>
> Very efficient.
>
> If you are starting with a username, then a JOIN
> would be indicated.
Thank you so much for your response.
I have a couple of questions I'd like to ask.
(1) I cannot guarantee that I will not insert unique times; multiple user events will occur and they'll happen with a probability of n users/event numbers. Given a few million records, would it make sense to create a numbered first column that is nothing more than a unique key (if I understand the notion of unique keys correctly in MySQL)? The useful life of each record is no more than a few hours; I will probably implement a cleanup of all records with a timestamp of >24 hours as part of the design.
(2) Why have you chosen an ENUM instead of a binary for the state? It will never be other than 1/0. Is that the most efficient type?
(3) Does it make sense to index by username and timestamp given my description? It seems like a lot of work to index by timestamp when it doesn't seem like it will provide any more useful information than the user_id will.
(4) There may be a necessity to add another column to the db. I mentioned events; the db would now look like:
UNIQUE_ID, USER_ID, EVENT_ID, TIMESTAMP, STATE
That event_id won't be part of the high I/O; it will be checked later, though. Each of the timestamps + state updates will be occurring during an event. Now, given this information, does it make sense to index by user_id first and event_id second instead of by timestamp? I'd give a ballpark figure of 350 timestamp + state updates during each event.
(5) I just went and looked up InnoDB; it is a product of Oracle and it looks like Oracle has substituted InnoDB for MyISAM as the default storage engine in MySQL 5.5. Why are you recommending that I use InnoDB instead of MyISAM here?