MySQL Forums
Forum List  »  Performance

Re: 39 MILLION ROW MARCH (USING MyISAM)
Posted by: Nathan Huebner
Date: November 17, 2005 05:19PM

Alan,

I do Drop-Shipping with various High-Volumn Product Vendors (such as vendors with 2 million or more products), and I compare their prices to Amazon and other selling sites, in order to find out my position to sell. I have to do all of my own programming / database administration, etc. It's tough to get MySQL to act like a professional service, when there's too much data.

Anyways- i made a thing called HyperStorage, it's similar to this 39 million row march thing, and supposedly it's very similar to the MySQL Partitioning system they are setting up. It looks exciting, however I am really ''--'' unstable, about using their partitioning system, until they beta test repeatedly over the next year or so, I can't trust it not to have bugs.

Feel free to try out my HyperStorage system, you can download it free for use/editing/redistribution, at http://www.sellchain.com/hyperstorage/hyperstorage.zip

Feel free to give out the URL, I hope everyone uses it, and learns the true power of Database partitioning.

It might seem weird to setup, but at least it has instructions, and semi user-friendly information, but the technology is extremely similar to the one MySQL is putting together, however theirs may have better performance than mine, because they are the Database Engineers, I'm just a programmer.

I keep coming back around to the idea of partitioning my data by using a storage engine, and I'm thinking i might get a little bit more complex on how to do it.

For example:

I have 2 million rows of data. Prices, Information, etc. You can't leave all of that information in there, it will take too long, even with proper indexing.

So the obvious solution is to split the table up, perferably into as many tables as possible. And then take 1 or 2 unique identifiers from the 2 million rows, store it into a KEY LOCATION table, and placing a location into that table, so you know where to find that specific piece of data.

Performance has its increases, because you can search around 40 million to 100 million rows of data almost instantly, when there is only 2 or 3 columns. Probably faster with 1 column, with minimum varchar settings, and a good collation.

So you go into the key table, find the unique identifier, and that tells you the Database, and Table that unique identifier's entire data is in.

I will actually be splitting the 2 million rows into 5000 tables...

That's 400 rows per table. I could perform my Updates / Selects, and calculations really fast, and if i needed to do some sort of batch processing, i could get it done REALLY fast.

Or, i could reduce that to 500 tables, and 4000 rows per table, etc.

The main solution is to be able to keep as little data as possible in your tables, to make searching go fast.

I would recommend also making a FULLTEXT table, which contains the full text you are searching, like Movie Title (etc), and put a location and identifier.

Here is some experience with Performance problems:

If i take 1 million rows, and place them in 1 table.
Using 5000 separate SELECT statements, takes around 15 seconds.

If i take 5000 rows, and place them in 1 table.
Using 5000 separate SELECT statements, takes around 3 to 4 seconds.

????? I am boggled.

Let me know if you need help with HyperStorage, it should turn some heads.

Nathan

Options: ReplyQuote


Subject
Views
Written By
Posted
2456
November 03, 2005 07:05AM
1354
November 16, 2005 06:42PM
Re: 39 MILLION ROW MARCH (USING MyISAM)
1470
November 17, 2005 05:19PM
1380
November 22, 2005 10:22PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.