MySQL Forums
Forum List  »  Performance

Re: RAM consideration/ IO usage
Posted by: Rick James
Date: January 14, 2015 01:47PM

I have seen servers running just fine with datasets 10 or 100 times as big as the buffer_pool. I have seen tiny datasets melting down the server because of poorly written indexes or SELECTs.

While it is good to have enough RAM to cache everything, it may not be necessary. You may want to spend your money somewhere else.

InnoDB _caches_ blocks in memory; all the blocks persist on disk. The caching is used to minimize the I/O.

If the table or index or working set of the data or whatever is small enough to fit in innodb_buffer_pool_size, then eventually (once the blocks have been cached) there will be no I/O for reading.

There is always I/O for writing, but it is 'delayed' in hopes of making multiple updates to a block before actually writing it to disk.

Also there is I/O (writes) to log things to maintain ACID transactional integrity. This happens on each COMMIT (or implicit COMMIT when autocommit=1).

Blocks in InnoDB (data or index) are 16KB. (This size can be changed, but it is extremely rare that anyone does so.)

The Data and the PRIMARY KEY live together in a BTree.
Each secondary key lives in its own BTree.
(Actually, they are B+Trees, which facilitates "range scans".)

To estimate the amount of space needed for a row, sum up the datatype sizes (4 bytes for INT, etc), then multiple by 3. ("3" is approximate, but generally useful.)
Then multiply by the number of rows.
If you have only a few indexes, that is usually good enough for the size of data + indexes.

innodb_buffer_pool_size (the size of the InnoDB cache) should be no more than about 70% of available RAM. If that setting can hold all your tables, then you will minimize I/O.

If your dataset is too big for the RAM available, then these things should be considered...
Is there a "working set"? That is, will you spend most of your time on a subset of the data? If so, and the buffer_pool is big enough for that subset, then there will be little I/O.
If you have an index (PRIMARY or secondary) starting with a 'random' value such as UUID/GUID/MD5/SHA1, the activity on that index will be jumping around a lot -- that is, there cannot be a small "working set".
If your activity (INSERTs, SELECTs, etc) is low enough, then I/O does not matter. A Rule of Thumb: A typical machine can do 100 I/Os per second. So, if you are doing less than 100 _simple_ queries per second, I/O is not a problem.
SELECT COUNT(*) FROM tbl -- this is not simple since it must scan the entire table (or perhaps one index). That could cost a lot of I/O if it is not cached.

What type of dataset will it be? How many rows do you expect? What types of queries will you write?

Some notoriously bad schemas:
* find "nearest" using latitude/longitude
* key-value pairs
* huge datasets that get purged of old data daily/monthly
* pagination of lots of web pages
I have techniques to optimize those (and others); adding RAM is the 'wrong' solution.

If you think you need 1000 I/Os per second, we can discuss hardware and software solutions.

Options: ReplyQuote


Subject
Views
Written By
Posted
1586
January 13, 2015 12:39PM
Re: RAM consideration/ IO usage
972
January 14, 2015 01:47PM
1168
January 16, 2015 02:14PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.