MySQL :: Re: Partitioning

New Topic

Re: Partitioning

Posted by: Rick James
Date: April 23, 2016 05:11PM

Your example had two similarly named columns:
URI_CM
URI
I was careful to distinguish them. But then your last reply said `URI_CM`, which does not agree with `PRIMARY KEY(URI, id)`.

For "locality of reference", the first field of the PRIMARY KEY should be always specified in the WHERE clause. And, since your table will be too big to cache, it will save a lot of I/O time.

I hesitate to answer the question of "concat 9M files together" vs "9M LOAD DATAs". The former is _probably_ faster, but _may_ have some unknown scaling problem. For example, you can't have 9M files in one directory and do "cat * >all_9m.csv" because the shell will truncate the expansion of "*" at 5K bytes (or something).

If you could sort the files in URI order (or is it URI_CM?), then the load would go faster.

I would
Plan A (9M LOADs):
1. load 1000 files and see how long that takes.
2. Multiply by 9000 to see what century it will finish in.
3. Either accept that and continue, or abandon Plan A.
Plan B (copy together):
1. Do the copy somehow.
2. Do some math to verify that all 9M got copied. (Or abandon)
3. Try the one huge LOAD.
Plan C (chunks)
1. Write a more complex script to gather a few hundred files together into a single file and LOAD it;
2. repeat

Your question is beyond my experience. However, my experience says that _any_ of the Plans I suggest _might_ blow up.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

Partitioning

Ahmad Sakhi

April 19, 2016 12:29PM

Re: Partitioning

Rick James

April 20, 2016 11:34PM

Re: Partitioning

Ahmad Sakhi

April 21, 2016 07:47AM

Re: Partitioning

Rick James

April 22, 2016 06:41PM

Re: Partitioning

Ahmad Sakhi

April 22, 2016 06:59PM

Re: Partitioning

Rick James

April 23, 2016 05:11PM

Re: Partitioning

Ahmad Sakhi

April 25, 2016 09:27AM

Re: Partitioning

Ahmad Sakhi

April 30, 2016 05:08PM

Re: Partitioning

Rick James

May 04, 2016 11:41PM

Re: Partitioning

Ahmad Sakhi

May 09, 2016 09:18AM

Re: Partitioning

Rick James

May 10, 2016 10:15AM

Re: Partitioning

Ahmad Sakhi

May 11, 2016 08:11AM

Re: Partitioning

Ahmad Sakhi

May 17, 2016 07:49AM

Re: Partitioning

Rick James

May 23, 2016 03:21PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.