MySQL Forums
Forum List  »  Partitioning

Re: Index with partitioning
Posted by: Rick James
Date: February 09, 2013 01:22PM

> I am not sure if this is correct but since it is a fact table, I need to join constantly to dimension tables through it´s keys. This index is to improve join performance. Any advice here?

If the JOIN criteria does not include the _first_ field of the index, that index will (usually) not be used.

> It is actually 25% now.

OK.

> Since it can hold 1024 partitions

Folly. Many operations on a partitioned table open _all_ tables before they realize they need to touch only one partition. (I consider this a design flaw, but we are stuck with it.) Opening more than, say, 50, partitions begins to noticeably slow things down.

> Sometimes takes 1.7 MM records on a single partition

Are you saying that a partition has (or will have) typically a million rows? And your table is headed for having a billion rows? 200GB of data?

> The insert performance is not a problem ... takes 20 minutes to do so.

Do you use LOAD DATA? Batched INSERT statements? (I recommend batching 100-1000 rows at a time.)

Are the rows being inserted in some order? If they are inserted in ID_DIA_FOTO,ID_DDD_UF order, it will be more efficient, especially as the table gets bigger.

Most of your fields are NULLable. Any fields that cannot be NULL should be declared NOT NULL. Each 8 fields you change to NOT NULL would save 1GB of table space (in MyISAM).

Most fields are INT SIGNED and are named ID_... Probably they should be UNSIGNED. Probably some (maybe not all) could be MEDIUMINT or SMALLINT, maybe even TINYINT. Each change from INT to MEDIUMINT will save 1GB (over 1 billion rows) in the .MYD, plus another 1GB for each index it is in.

> decimal(19,2)

Are you expecting hyper-inflation? 99,999,999,999,999,999.99 is a really huge price in dollars, Euros, Pesos, etc. (It takes about 11 bytes.)

> `ID_PROCESSO_DE_VENDA` int(11) NOT NULL AUTO_INCREMENT,

That can handle only 2 billion. INT UNSIGNED would let you get to 4 billion. Be sure you don't overshoot that.

The only JOINs or WHERE clauses that can use what you have are those that include (at least) one of these:
ID_PROCESSO_DE_VENDA -- primary key
ID_DIA_FOTO -- partitioning
ID_PROD_SERV -- secondary key
Any query that does not include at least one of these is destined to scan all billion rows. That will take hours.

You should seriously consider Summary Tables rather than JOINing to the Fact table. I have frequently discussed such...
http://mysql.rjweb.org/bestof.html (and search for 'Data Warehouse', 'Summary', 'big').

Options: ReplyQuote


Subject
Views
Written By
Posted
2487
January 30, 2013 06:23AM
1464
February 01, 2013 10:20PM
1666
February 04, 2013 08:03AM
1482
February 07, 2013 10:09PM
1456
February 08, 2013 06:55AM
Re: Index with partitioning
1307
February 09, 2013 01:22PM
1972
February 14, 2013 12:26PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.