Re: Suggestion for optimization - reg
Posted by:
Rick James
Date: January 24, 2012 02:18AM
5M rows? 1 Terabyte?
VARCHAR has a 1-byte or 2-byte overhead for the length. If the string is really fixed length, them make it simply CHAR:
> `statecode` varchar(2) NOT NULL DEFAULT '',
--> CHAR(2)
Normalize into other tables any bulky fields. Then replace the field with a SMALLINT UNSIGNED or other type. Perhaps:
> `policestation` varchar(50) NOT NULL DEFAULT '',
> `local_registrar` varchar(50) DEFAULT NULL,
> `reason_not_captured` varchar(50) DEFAULT NULL,
> `birth_country` varchar(50) DEFAULT NULL,
> `fingerimage1` blob,
> `fingerimage2` blob, ...
and probably
> `fingerid1` varchar(2) DEFAULT NULL,
> `fingerid2` varchar(2) DEFAULT NULL, ...
Store in another table as 10 rows, linked to this table via
Similarly move out other blobs.
By hardcoding a single dateofbiometricenrolled in this table, you are restricting future technical advances. Do not store any of the biometrics (photo, fingerprints, signature, retinal scan, palm print, etc) in this table. Instead, store them in other table(s). This includes type, blob(s), date captured, etc.
> `age` varchar(10) DEFAULT NULL,
Eh? Doesn't that change annually? Plan on computing it rather than storing it.
> `fathernm` varchar(99) DEFAULT '',
> `mothernm` varchar(99) DEFAULT '',
> `spousenm` varchar(99) DEFAULT '',
That's rather limited. What about second marriages? Adoptions? Etc. I know someone with 9 grandparents.
> `eduname` varchar(99) DEFAULT '',
I know someone with 4 advanced degrees. (And normalize)
> `occuname` varchar(99) NOT NULL DEFAULT '',
Lots of people have two jobs. (And normalize)
State, district, town, ward, etc -- Normalize this clump out into another table.
The suggestions above will shrink the disk footprint somewhat.
> limited chance of crash
If there is a power failure, you need to REPAIR MyISAM tables. Perhaps you need to switch to InnoDB, which is self-repairing. (Now, change the estimate to 2TB.)
> Most of the time the table is queried only.
What SELECT(s) are done? By one of the keys? Looking for a single row?
As the table grows, single-row, keyed, lookups will not get much slower.
Is latin1 the correct charset to use? What dictatorship is this table being built for?