MySQL Forums
Forum List  »  Partitioning

Re: Index with partitioning
Posted by: Hélio Hélio
Date: February 14, 2013 12:26PM

> If the JOIN criteria does not include the _first_ field of the index, that index will (usually) not be used.

I am aware of that. When I make a join, I do it in the order of the index. The EXPLAIN statement can prove that.

> Folly. Many operations on a partitioned table open _all_ tables before they realize they need to touch only one partition. (I consider this a design flaw, but we are stuck with it.) Opening more than, say, 50, partitions begins to noticeably slow things down.

I think I can manage to drop some old partitions, will be looking into that. Thanks.

> Are you saying that a partition has (or will have) typically a million rows? And your table is headed for having a billion rows? 200GB of data?

I calculated an estimate of 20 MM records per year. At this far, there are 16 MM records with 103 partitions. Table has now 2,681 MB of data.

> Do you use LOAD DATA? Batched INSERT statements? (I recommend batching 100-1000 rows at a time.)

I know LOAD DATA/BULK Insert are best with insert performance. But since it is an data warehouse project, I am using Pentaho (Kettle) Data Integration to manage this. It can update the dimension and fact tables at the same time. And that means BATCH INSERT, with 1200 records per second of performance (which is not that good, but it is ok).

> Are the rows being inserted in some order? If they are inserted in ID_DIA_FOTO,ID_DDD_UF order, it will be more efficient, especially as the table gets bigger.

Most of your fields are NULLable. Any fields that cannot be NULL should be declared NOT NULL. Each 8 fields you change to NOT NULL would save 1GB of table space (in MyISAM).

Most fields are INT SIGNED and are named ID_... Probably they should be UNSIGNED. Probably some (maybe not all) could be MEDIUMINT or SMALLINT, maybe even TINYINT. Each change from INT to MEDIUMINT will save 1GB (over 1 billion rows) in the .MYD, plus another 1GB for each index it is in.

Are you expecting hyper-inflation? 99,999,999,999,999,999.99 is a really huge price in dollars, Euros, Pesos, etc. (It takes about 11 bytes.)

Wow! I didn´t know about this. Will take a look. Thanks

> The only JOINs or WHERE clauses that can use what you have are those that include (at least) one of these:
ID_PROCESSO_DE_VENDA -- primary key
ID_DIA_FOTO -- partitioning
ID_PROD_SERV -- secondary key
Any query that does not include at least one of these is destined to scan all billion rows. That will take hours.

I will take a look into that too.

> You should seriously consider Summary Tables rather than joining to the Fact table. I have frequently discussed such...
http://mysql.rjweb.org/bestof.html (and search for 'Data Warehouse', 'Summary', 'big').

I really considered that, but my business type is very volatile and there are only 2 team members managing the database. I think that create an summarized table would increase performance drastically but the trade off is management. So I decided to have a larger table with smaller granularity because it is easier to manage.

Thanks for the tips, I will come back w/ the results!

By the way, I skipped the disabling/enabling index before/after inserting records. The process didn´t slow down the inserting step. I will monitore this. But hey, that´s good news!

Options: ReplyQuote


Subject
Views
Written By
Posted
3040
January 30, 2013 06:23AM
1857
February 01, 2013 10:20PM
2043
February 04, 2013 08:03AM
1816
February 07, 2013 10:09PM
1799
February 08, 2013 06:55AM
1673
February 09, 2013 01:22PM
Re: Index with partitioning
2654
February 14, 2013 12:26PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.