Date: October 03, 2008 09:03AM
I wonder if anybody can perhaps suggested a way of improving my warehousing setup, since MySQL seems to be a major bottleneck at present.
I have a custom ETL program that I've written in Java. It takes data from instance servers and shoves it into the warehouse. Initially it did this directly but it turned out to be far to slow having to check for duplicates on insert.
What I've done instead is insert the data into staging tables (innodb) without deduping, then performing a few insert selects with unique compound keys to get the data from the flat staging tables to the warehouse star schema.
Doing this has proved to be much quicker than inserting directly, but the bottleneck is getting the data into the staging tables.
I initially tried MyISAM, but innoDB seems to be quicker. I perform bulk inserts of about 10,000 rows at a time.
I've just been getting excited about brigthouse only to be let down by reading it doesn't yet have utf8 support.
Anyway, so I guess my question is, what is the fastest possible way to insert a lot (6gb) of deduplicated data into a star schema or even, just the staging tables?
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.