> If the JOIN criteria does not include the _first_ field of the index, that index will (usually) not be used.
I am aware of that. When I make a join, I do it in the order of the index. The EXPLAIN statement can prove that.
> Folly. Many operations on a partitioned table open _all_ tables before they realize they need to touch only one partition. (I consider this a design flaw, but we are stuck with it.) Opening more than, say, 50, partitions begins to noticeably slow things down.
I think I can manage to drop some old partitions, will be looking into that. Thanks.
> Are you saying that a partition has (or will have) typically a million rows? And your table is headed for having a billion rows? 200GB of data?
I calculated an estimate of 20 MM records per year. At this far, there are 16 MM records with 103 partitions. Table has now 2,681 MB of data.
> Do you use LOAD DATA? Batched INSERT statements? (I recommend batching 100-1000 rows at a time.)
I know LOAD DATA/BULK Insert are best with insert performance. But since it is an data warehouse project, I am using Pentaho (Kettle) Data Integration to manage this. It can update the dimension and fact tables at the same time. And that means BATCH INSERT, with 1200 records per second of performance (which is not that good, but it is ok).
> Are the rows being inserted in some order? If they are inserted in ID_DIA_FOTO,ID_DDD_UF order, it will be more efficient, especially as the table gets bigger.
Most of your fields are NULLable. Any fields that cannot be NULL should be declared NOT NULL. Each 8 fields you change to NOT NULL would save 1GB of table space (in MyISAM).
Most fields are INT SIGNED and are named ID_... Probably they should be UNSIGNED. Probably some (maybe not all) could be MEDIUMINT or SMALLINT, maybe even TINYINT. Each change from INT to MEDIUMINT will save 1GB (over 1 billion rows) in the .MYD, plus another 1GB for each index it is in.
Are you expecting hyper-inflation? 99,999,999,999,999,999.99 is a really huge price in dollars, Euros, Pesos, etc. (It takes about 11 bytes.)
Wow! I didn´t know about this. Will take a look. Thanks
> The only JOINs or WHERE clauses that can use what you have are those that include (at least) one of these:
ID_PROCESSO_DE_VENDA -- primary key
ID_DIA_FOTO -- partitioning
ID_PROD_SERV -- secondary key
Any query that does not include at least one of these is destined to scan all billion rows. That will take hours.
I will take a look into that too.
> You should seriously consider Summary Tables rather than joining to the Fact table. I have frequently discussed such...
(and search for 'Data Warehouse', 'Summary', 'big').
I really considered that, but my business type is very volatile and there are only 2 team members managing the database. I think that create an summarized table would increase performance drastically but the trade off is management. So I decided to have a larger table with smaller granularity because it is easier to manage.
Thanks for the tips, I will come back w/ the results!
By the way, I skipped the disabling/enabling index before/after inserting records. The process didn´t slow down the inserting step. I will monitore this. But hey, that´s good news!