Re: Partitioning
Posted by:
Rick James
Date: April 23, 2016 05:11PM
Your example had two similarly named columns:
URI_CM
URI
I was careful to distinguish them. But then your last reply said `URI_CM`, which does not agree with `PRIMARY KEY(URI, id)`.
For "locality of reference", the first field of the PRIMARY KEY should be always specified in the WHERE clause. And, since your table will be too big to cache, it will save a lot of I/O time.
I hesitate to answer the question of "concat 9M files together" vs "9M LOAD DATAs". The former is _probably_ faster, but _may_ have some unknown scaling problem. For example, you can't have 9M files in one directory and do "cat * >all_9m.csv" because the shell will truncate the expansion of "*" at 5K bytes (or something).
If you could sort the files in URI order (or is it URI_CM?), then the load would go faster.
I would
Plan A (9M LOADs):
1. load 1000 files and see how long that takes.
2. Multiply by 9000 to see what century it will finish in.
3. Either accept that and continue, or abandon Plan A.
Plan B (copy together):
1. Do the copy somehow.
2. Do some math to verify that all 9M got copied. (Or abandon)
3. Try the one huge LOAD.
Plan C (chunks)
1. Write a more complex script to gather a few hundred files together into a single file and LOAD it;
2. repeat
Your question is beyond my experience. However, my experience says that _any_ of the Plans I suggest _might_ blow up.
Subject
Written By
Posted
Re: Partitioning
April 23, 2016 05:11PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.