MySQL Forums
Forum List  »  InnoDB

Re: Data Directory is not moving subpartition
Posted by: Rick James
Date: August 30, 2014 11:34AM

> CHAR(30) ... It is not always 30. This is also a throughput enhancement I found on the web. It seemed reasonable that there would be less overhead and the cost in space is tolerable.
> I was trying to trade space for performance.

The _old_ advice about using FIXED width records was almost never valid. Furthermore, it applied only to MyISAM. (Anything written before about 2002 would not even mention InnoDB.)

I say the advice was rarely valid because it is almost always better to save the disk space, hence shrink the data, hence have less I/O, hence be faster -- than to make UPDATEs a little cleaner.

> The data is DNA sequences, there's no way to normalize it but it's a 4 character alphabet and should compress nicely.

Yes. In theory it can compress about 4:1. Maybe it will be tighter if there are a lot of repeated subsequences.

> It is variable length and I could put it in a varchar

Plan A: Use TEXT CHARACTER SET ASCII and hope that InnoDB's compression is efficient enough.
Plan B: Use compress(), zip(), bzip() (or whatever) in your code, then make the field BLOB.

InnoDB's compression is improved in 5.6 (another reason to upgrade). Still, it spends time and space making two copies of the data blocks in the buffer_pool.

Another argument for Plan B... Innodb moves 'big' fields off into a different block. The actual cutoff is complex and depends on several things, but I suspect that a TEXT with several hundred bytes will be 'moved', but the equivalent pre-compressed BLOB (of only a few hundred bytes) will fit within the constraints.

TEXT and BLOB have a 2-byte overhead for length, plus allow 64K bytes.

> remove the indices during insert and build an index later

You may find that to be either impractical or still too slow.
Changing the indexed fields to VARCHAR will make the indexes smaller, hence more efficient to maintain.

Can you share some info on the secondary indexes? In particular, are they 'random' or more predictable (as with AUTO_INCREMENT)? CHAR(30) -- how long are they typically?

> sample_id int(11) NOT NULL,

Since this is part of all 4 indexes, can it be made smaller?
MEDIUMINT UNSIGNED NOT NULL -- 3 bytes; range of 0..16 million
SMALLINT UNSIGNED NOT NULL -- 3 bytes; range of 0..65535
That will save 300MB or 600MB across the secondary keys. (OK, that's only a few percent, but it is pretty simple.)

100M rows of 1KB rows (uncompressed) = 100GB data + 20GB indexes
After pre-compressing, VARCHARing, etc: 25GB + 15GB

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Data Directory is not moving subpartition
755
August 30, 2014 11:34AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.