Re: physical bytes
Posted by:
Rick James
Date: September 12, 2011 11:22AM
Let's throw another factor into the mix. When you UPDATE one row, are you also updating a million other rows at once? Are the rows likely to be 'near' each other?
"Count the disk hits".
For a trillion rows, performance is all about counting the disk hits. CPU, RAM size, caching, etc are all blown away by the need for hitting the disk for _every_ row.
Unless...
Unless the rows you hit can, due to the characteristics of the data and of the application, be "adjacent".
Fetching one million _random_ rows will be (approximately) one million disk hits. On ordinary drives, this will take 2-3 hours.
Fetching one million _consecutive_ rows will take more like 10K disk hits. (The value varies with Engine, schema, etc, etc.) ~2 minutes.
Rules of thumb:
RAID striping (-1, -5, -6, -10): Divide the time by the striping factor.
SSDs: Divide the time by 10.
To update a trillion VARCHARs, or even CHARs, will take weeks even under optimal accesses. It will take years in the worst case (random access, grow the length, etc)
Let's switch the metric to 'frequency'. How often would you like to UPDATE a VARCHAR? 10 times a second -- no problem. Count the disk hits: maybe 4 per row: MyISAM: 2 for locating record and deleting it from there, 2 more for writing elsewhere. InnoDB: 2 -- it will stay in the same block (block splits would be rare).
Caveat: In my calculations, I am assuming that non-leaf nodes are fully cached and that leaf nodes are not cached long enough to matter. You haven't shows the Engine or the Schema, so I can't even guess how big the non-leaf nodes will be or whether they will fit in RAM. With 1T rows, there is some chance that non-leaf nodes cannot be fully cached; this would add to the number of disk hits, especially for random operations.
Back to 1T and CHAR/VARCHAR.
* Avoid utf8 except where it is necessary.
* Calculate the table size for CHAR(10) versus VARCHAR -- to see how much disk space CHAR(10) would force on you. Also, find out if InnoDB would trim CHAR anyway.
* Keep in mind the radical differences between InnoDB and MyISAM.
* Consider BINARY/VARBINARY.
* Can you avoid the UPDATE you describe?
* Consider ENUM instead of CHAR/VARCHAR. It takes 1 byte (up to 255 options), or 2 bytes.