Re: Partitioning
Posted by: Rick James
Date: April 23, 2016 05:11PM

Your example had two similarly named columns:
URI_CM
URI
I was careful to distinguish them. But then your last reply said `URI_CM`, which does not agree with `PRIMARY KEY(URI, id)`.

For "locality of reference", the first field of the PRIMARY KEY should be always specified in the WHERE clause. And, since your table will be too big to cache, it will save a lot of I/O time.

I hesitate to answer the question of "concat 9M files together" vs "9M LOAD DATAs". The former is _probably_ faster, but _may_ have some unknown scaling problem. For example, you can't have 9M files in one directory and do "cat * >all_9m.csv" because the shell will truncate the expansion of "*" at 5K bytes (or something).

If you could sort the files in URI order (or is it URI_CM?), then the load would go faster.

I would
Plan A (9M LOADs):
1. load 1000 files and see how long that takes.
2. Multiply by 9000 to see what century it will finish in.
3. Either accept that and continue, or abandon Plan A.
Plan B (copy together):
1. Do the copy somehow.
2. Do some math to verify that all 9M got copied. (Or abandon)
3. Try the one huge LOAD.
Plan C (chunks)
1. Write a more complex script to gather a few hundred files together into a single file and LOAD it;
2. repeat

Your question is beyond my experience. However, my experience says that _any_ of the Plans I suggest _might_ blow up.

Options: ReplyQuote


Subject
Written By
Posted
April 19, 2016 12:29PM
April 20, 2016 11:34PM
April 21, 2016 07:47AM
April 22, 2016 06:41PM
April 22, 2016 06:59PM
Re: Partitioning
April 23, 2016 05:11PM
April 25, 2016 09:27AM
April 30, 2016 05:08PM
May 04, 2016 11:41PM
May 09, 2016 09:18AM
May 10, 2016 10:15AM
May 11, 2016 08:11AM
May 17, 2016 07:49AM
May 23, 2016 03:21PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.