MySQL Forums
Forum List  »  Performance

39 MILLION ROW MARCH (USING MyISAM)
Posted by: Nathan Huebner
Date: November 03, 2005 07:05AM

I call this the 39 Million Row March :P

PURPOSE: To simulate how MySQL would handle 39 million lines of customer data. This is going to be used to split all of my incoming customers into seperate tables, by the first 3 numbers of their ID #. This will divide the number of 39 million by up to 999. Making performance higher, data run faster, inserts run faster, updates run faster, and with proper indexing, generally everything for my data storage for new users will run FAST.

NOTE: USE MyISAM for this, it's not tested in other formats. Also, I used utf8_general_ci for collation.

THIS WILL SHOW YOU HOW TO SIMULATE 39 MILLION ROWS OF MYSQL DATA. WITHOUT TAKING LONGER THAN 30 MINUTES TO COMPLETE. NO INSERTS INVOLVED.


Creating 39 million lines to simulate Emails + Id #'s
example:

WANT TO TEST IT?

IF YOU WANT TO SEE HOW MYSQL CAN HANDLE 39 MILLION ROWS OF KEY DATA

Create a table, call it userlocation (or whatever you want)

Then add a column "email" and "id", make the email 100 Varchar, and id 15 Varchar.
Remember, this is all you need, you don't want extra stuff in this table, it will cause a lot of slow-down. So if My ID # is 10399999 then i would be stored in a table called 103 (or table 10 if you used 2 digits). This is how we will organize all of our new customers. I do not recommend simulating the storage of 1 COMPLETE 39 MILLION ROW DATABASE WITH FULL DETAILS, CUSTOMER CONTACT INFORMATION, OR PRODUCTS. You should seperate these into many different tables, and use this method as a KEY FINDER to locate those tables. The less data you put in your KEY FILE (userlocation), THE FASTER IT WILL GO.

You could work this any way you want... by first digit, first two digits, first three digits, even first 4, but first 3 and 4 starts to create a lot of tables, i don't know how that would work, especially if you have 100 to 999 as first digit ID numbers, that would create 899 tables, which probably isn't a good idea. but this method will allow you to split up one giant table, into many different tables, and be able to find those specific tables fast.

OK FIRST: LETS MAKE 39 MILLION ROWS OF DATA in a TAB DELIMITED FORMAT
YOU WILL NEED PHP FOR THIS

READ MY WARNING FIRST:

WARNING WARNING WARNING
SCRIPT WILL EAT AROUND 1.09 GIGABYTES MAKE SURE YOU HAVE DISK SPACE
IT WILL STORE TO C:\mil.txt
WARNING WARNING WARNING




<?
# Creates exactly 39 million rows
# May take up to 14 minutes depending on machine / hd space
set_time_limit(9999); # this will keep it from timing out
echo "Writing...<br><br>";
$fp=fopen("C:\mil.txt","wb");
for ($x=1000000; $x < 40000000; $x++) {
$fw=fwrite($fp, "$x@aol.com" . chr(9) . $x . "\r\n");
}
fclose($fp);
echo "Finished writing...";
?>

########### DO NOT CONTINUE UNTIL YOUR C:\mil.txt FILE IS COMPLETE

# NOW TO GET THE 39 MILLION ROWS INTO MYSQL

After you have your C:\mil.txt file ready, you should now LOAD DATA INFILE to that userlocation table (remember, this is just a performance test).

HERE IS THE SYNTAX TO DO THIS:

BE SURE YOU HAVE ONLY COLUMNS: Email (varchar 100), Id (varchar 15)

LOAD DATA INFILE 'C:/mil.txt' INTO TABLE userlocation;

Give it a minute or so, its 39 million lines :P

__________________________________________________________

LOAD DATA STATEMENT RETURN TIME: 1 minute 6.91 seconds. (39 Million Rows)

___________________________________________________________

INDEX YOUR EMAIL COLUMN, BECAUSE YOU WILL HAVE TO SEARCH BY IT.
MySQL Time: 14 minutes.
Cardinality of Email SHOULD BE 39 million

BE SURE TO ADD set_time_limit(9999); TO YOUR read_dump.php IN THE PHP MY ADMIN FOLDER, OR INDEX IT MANUALLY THROUGH COMMAND PROMPT.

Click the LIGHTNING icon on the row of EMAIL for your table in PHPMyAdmin.

___________________________________________________________

Querying the table:

SELECT * FROM `byemail` WHERE email = '24123765@aol.com'
Query return time: 0.05 sec...

This seems to work.

Now you have the email address, and you can return the ID # from that row to locate the user's table. In the future, this should make things easier on you.

Good luck! :)



Edited 1 time(s). Last edit at 11/03/2005 07:08AM by Nathan Huebner.

Options: ReplyQuote


Subject
Views
Written By
Posted
39 MILLION ROW MARCH (USING MyISAM)
2462
November 03, 2005 07:05AM
1357
November 16, 2005 06:42PM
1380
November 22, 2005 10:22PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.