Re: physical bytes
Posted by: Rick James
Date: September 12, 2011 11:22AM

Let's throw another factor into the mix. When you UPDATE one row, are you also updating a million other rows at once? Are the rows likely to be 'near' each other?

"Count the disk hits".

For a trillion rows, performance is all about counting the disk hits. CPU, RAM size, caching, etc are all blown away by the need for hitting the disk for _every_ row.

Unless...

Unless the rows you hit can, due to the characteristics of the data and of the application, be "adjacent".

Fetching one million _random_ rows will be (approximately) one million disk hits. On ordinary drives, this will take 2-3 hours.
Fetching one million _consecutive_ rows will take more like 10K disk hits. (The value varies with Engine, schema, etc, etc.) ~2 minutes.

Rules of thumb:
RAID striping (-1, -5, -6, -10): Divide the time by the striping factor.
SSDs: Divide the time by 10.

To update a trillion VARCHARs, or even CHARs, will take weeks even under optimal accesses. It will take years in the worst case (random access, grow the length, etc)

Let's switch the metric to 'frequency'. How often would you like to UPDATE a VARCHAR? 10 times a second -- no problem. Count the disk hits: maybe 4 per row: MyISAM: 2 for locating record and deleting it from there, 2 more for writing elsewhere. InnoDB: 2 -- it will stay in the same block (block splits would be rare).

Caveat: In my calculations, I am assuming that non-leaf nodes are fully cached and that leaf nodes are not cached long enough to matter. You haven't shows the Engine or the Schema, so I can't even guess how big the non-leaf nodes will be or whether they will fit in RAM. With 1T rows, there is some chance that non-leaf nodes cannot be fully cached; this would add to the number of disk hits, especially for random operations.

Back to 1T and CHAR/VARCHAR.
* Avoid utf8 except where it is necessary.
* Calculate the table size for CHAR(10) versus VARCHAR -- to see how much disk space CHAR(10) would force on you. Also, find out if InnoDB would trim CHAR anyway.
* Keep in mind the radical differences between InnoDB and MyISAM.
* Consider BINARY/VARBINARY.
* Can you avoid the UPDATE you describe?
* Consider ENUM instead of CHAR/VARCHAR. It takes 1 byte (up to 255 options), or 2 bytes.

Options: ReplyQuote


Subject
Written By
Posted
September 07, 2011 02:03PM
September 08, 2011 08:30PM
September 08, 2011 10:30PM
September 09, 2011 06:27PM
September 11, 2011 12:18PM
Re: physical bytes
September 12, 2011 11:22AM
September 13, 2011 10:33PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.