> What is the average length of `Value`? CHAR(8)
> always takes 8 bytes; VARCHAR(8) would take
> 1+avg.
Very interesting.
SELECT AVG(LENGTH(Value)) FROM `RemoteStatuses`
gives (after a while...) 4.4. So if there is little advantage of a fixed row length, this row could be made into a varchar, which on average would be 4.4+1 byte = 5.4 bytes rather than 8. This would save 2.6 bytes per row in the data, but not in the index. I may give this a try, although converting to 4 byte float might be even better (see later).
> If you are always querying this way:
> > WHERE RemoteID = X AND DataID = Y
> then you may as well 'normalize' that pair of
> values in another table, replacing the two columns
> with a SMALLINT UNSIGNED or MEDIUMINT UNSIGNED
> that is the id for the pair. That would shave 3-4
> bytes off your 19-byte records.
The two are almost always queried together. RemoteID is the datalogger, and DataID is the peice of data, e.g. wind speed=130, air temperature=200. RemoteID might be replaceable with a MEDIUMINT, but DataID cannot really be any smaller than SMALLINT which it already is. Replacing the columns with another joint ID might be possible, but I would guess this would at least need to be a int.
Option 1: Change RemoteID from INT into a MEDIUMINT: Would save 1 byte in index and 1 byte in data. Few or no application level changes required.
Option 2: Put RemoteID and DataID into a surrogate key: Assuming new key is INT, this would save 2 bytes in index and 2 bytes in the data. Many application level changes required.
Option 3: Limit RemoteID to the range upto 2^19, and DataID 2^13. Together they could define an INT in a predictable way. This would save 2 bytes in the index and 2 bytes in the data. Would require many appliction level changes, but probably not as many as option 2, as I could simply bitshift the data.
> > dropping the partition which is 12 months old
> That, alone, is worth PARTITIONing for. (There
> may be no other advantages in PARTITIONing. The
> one SELECT you showed will not run any faster.)
All of the queries have been optimized to use BETWEEN so that it can use the correct partition(s). This has improved the speed of select queries. In general, only the latest partition is updated and sees the most queries.
> > batches of around 100
> Good. Is there any issue with INSERT speed? (I
> suspect not.)
Inserts are normally fairly quick.
> > "INSERT IGNORE" statement
> 'IGNORE' -- just in case you reload the data??
Very occasionally the loggers send the same data twice: I want mysql to bin the idential value based on the duplicate primary key, without throwing an error.
> MyISAM will drill down the BTree in the .MYI, then
> do a random read into the .MYD.
> InnoDB would drill down the BTree in the PK/data,
> and find the entire record at the leaf node.
I was put off InnoDB by its much larger footprint, and assumptions that if the index is larger, disk seeks would be slower, and the index is less likely to fit in memory. I will try converting a table to InnoDB, and see how it performs.
> Are you running 32-bit? or 64-bit?
64-bit.
> That's just 1 month? And you plan to have 12 months?
Yes
> Changing myisam_data_pointer_size from the default
> of 6 to 5 would shrink the index by 275MB. (3GB
> for a year)
This is an excellent idea. Is it possible to specify this on a per-table basis by specifying MAX_ROWS, rather than changing the global my.cnf?
Is there a similar optimisation for InnoDB?
> > Value is a char(8) because sometimes the data
> might not be numeric
> You would probably be better off having two tables
> -- one for numeric data (with a 4-byte FLOAT), and
> another for non-numeric (VARCHAR(8) - 1-9 bytes).
> The added complexity of fetching from two tables
> might be worth the space/speed savings.
This is certainly worth considering, and would not take too many application level changes. It would save 4 bytes on each row, but not save anything on the index.