Re: Row size computation
Posted by:
Rick James
Date: April 06, 2012 08:50PM
> Thanks to you, now i see that this increase is probably result of 'splits', maybe table should be restructured then?
No, probably not. So what if a table expands and contracts by, say, 30%? More details...
All indexing is done with BTree technology. When fetching a single row from a million row table, it will traverse about 3 levels of BTree. The block splits of the data have negligible impact on performance for this.
> `created_at` int(11) default NULL,
If that is a date, you possibly wanted INT UNSIGNED.
> `country` char(3) character set ascii collate ascii_bin NOT NULL,
Good. It is amazing how many people screw up country code.
> I guess that adding auto_increment ...
Changing the PK to be an auto-inc will avoid the splits (assuming no UPDATEs), but will make lookup by (`dev_id`,`app_id`,`app_version`) more costly:
INDEX(`dev_id`,`app_id`,`app_version`)
and
WHERE dev_id = ... AND ...
will drill down the BTree for that index (now a "secondary" index). At the leaf node, it will find the PK (the new id). Then it will drill down the BTree that contains the PK and the data row.
It is hard to say which would be better/worse. I will say that it probably does not matter nearly as much as we have spent time discussing it.
> Good question would be what is 'good enough' value of padding and what is impact of memory caching and performance.
You have no control over the padding.
The most important metric is whether the "working set" of all the InnoDB tables is small enough to fit into innodb_buffer_pool_size. If the data+index is (or will become) too big for the buffer_pool, you will incur more I/O. This will be a performance issue.
RoT (Rule of Thumb): If a SELECT can find all its blocks in the buffer_pool, it will run 10 times faster than if it has to hit the disk.
You have buffer_pool of 6GB, but only 60MB (data+index) in this table -- that is only 1% of the buffer pool. The other 99% is either unused or used by other tables.
> Do you know what is the impact of 'splits' (and as a result increased table size) on performance or can you point me somewhere where i can investigate it?
I have no pointer for it. But based on what you have said so far, the impact of the splits is very, very, insignificant. I recommend dropping the issue and focus on other things.
If you are having performance issues, it not because of block splitting.
> We occasionally do big selects...
Do they cause performance issues?