MySQL :: Re: Partitioning

New Topic

Re: Partitioning

Posted by: Ahmad Sakhi
Date: May 09, 2016 09:18AM

Thanks for the response Rick.

Okay, let me provide some further details on the use case.

Background:
Currently, the J2EE application access the data from these CSV files, which contains the fields below, With the exception of ID and URI which don't exist in the CSV files now. ID would be an auto-increment INT field as part of PK key and URI would be the CSV file name which will be part of WHERE clause in a SELECT statement. i.e. if we look at the following SELECT statement.

SELECT START_CM, END_CM, SCORE FROM BOOK_TBL WHERE URI='MC001'

START_CM = INT --> signifies the starting position of a given text.
END_CM = INT --> signifies the ending position of a given text.
SCORE = float --> signifies the percentage value.
URI = VARCHAR(~50) --> This would be the name of the CSV file. IT contains alpha-numeric value.
URI_CM1 = VARCHAR (~25) --> signifies a text (Text_1)
URI_CM2 = VARCHAR (~25) --> signifies a text (Text_2)
TEXT_1 VARCHCAR (~500 to ~1000 or maybe more I need to analyze) --> holds Arabic text.
TEXT_2 VARCHCAR (~500 to ~1000 or maybe more I need to analyze) --> holds Arabic text.

But, TEXT_1 and _2 aren't going to be part of my initial select. So, here's how the application behaves.

1. Issue a SELECT to get the set of records for a given CM (between 5 to 1000).
2. perform some actions.
3. While I have the data in memory, the application will issue another select that will return a UNIQUE record to get TEXT_1 and _2. i.e.

SELECT TEXT_1, TEXT_2 FROM BOOK_TBL WHERE URI_CM1='ABxxxxx' AND URI_CM2='EFxxxxx' AND URI='MCxxxxx'

This query will return uniquely ONE record.

Now my concern is should I include TEXT_1 and _2 as part of my initial SELECT or should I issue another SELECT which will return a unique record?

There's no other field that can be unique in these CSV files (not by itself). combination of few fields can however. That's why I have to create an auto-increment INT to make it -- as part of -- the PK. So, combination of URI_CM1, URI_CM2, URI can uniquely identify a row.

**I realized I forgot to mention URI_CM1 and URI_CM2 in my initial post.

| ID | SCORE | URI_CM1 | URI_CM2 | URI | START_CM | END_CM | TEXT_1 | TEXT_2 |

Another pattern that I think is worth mentioning is that the entire 9 Billion records can be categorized (or identified) in 3 main chunks.

1. ~5 million CSV files
2. ~2 million CSV files
3. ~2 million CSV files

These files typically start with a common set of characters. i.e. all the files starts with either 1) ABxxxxxx, 2)EFxxxxxx or 3)MCxxxxxx. So, I was thinking, if that helps, to have 3 tables, as opposed to 1. This way, at least we don't deal with 1 table worth of 9 billion records but 3 -- rather -- smaller tables of 5, 2 and 2 billion records.

Some questions:

>>>"If there will be 5 billion rows, INT will not be big enough."
- What's the largest value INT can hold?
- Do I have another option -- besides INT -- to be my PK auto-increment?

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

Partitioning

Ahmad Sakhi

April 19, 2016 12:29PM

Re: Partitioning

Rick James

April 20, 2016 11:34PM

Re: Partitioning

Ahmad Sakhi

April 21, 2016 07:47AM

Re: Partitioning

Rick James

April 22, 2016 06:41PM

Re: Partitioning

Ahmad Sakhi

April 22, 2016 06:59PM

Re: Partitioning

Rick James

April 23, 2016 05:11PM

Re: Partitioning

Ahmad Sakhi

April 25, 2016 09:27AM

Re: Partitioning

Ahmad Sakhi

April 30, 2016 05:08PM

Re: Partitioning

Rick James

May 04, 2016 11:41PM

Re: Partitioning

Ahmad Sakhi

May 09, 2016 09:18AM

Re: Partitioning

Rick James

May 10, 2016 10:15AM

Re: Partitioning

Ahmad Sakhi

May 11, 2016 08:11AM

Re: Partitioning

Ahmad Sakhi

May 17, 2016 07:49AM

Re: Partitioning

Rick James

May 23, 2016 03:21PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.