MySQL :: Inserting at Extreme Volumes

New Topic

Inserting at Extreme Volumes

Posted by: Michael Gargiullo
Date: October 11, 2006 08:44PM

Carry over from the MySQL Users list:
I'm working on a project in which we'd like to convert from Oracle to
MySQL. We need to partition our data for speed concerns. Currently in
Oracle I create 8, 3 hour partitions for each day (Currently running
450M -750M rec inserts/day). I was looking for matching functionality in
MySQL, but it seams daily partitions are as close as I'm going to come.

We're running 5.1.10 and I'm having a bit of trouble creating partitions
in both new tables and altering old tables. Below is one example of
what I've tried.

Can anyone shed some light on this subject?

-Mike

create table t1 (c1 int default NULL, c2 varchar(30) default NULL, c3
datetime default NULL) engine=myisam PARTITION BY RANGE(to_days(c3))

PARTITION p0 VALUES LESS THAN (to_days('2006-09-24'))(

SUBPARTITION s0a

DATA DIRECTORY = '/FW_data1'

INDEX DIRECTORY = '/FW_indx1'

),

PARTITION p1 VALUES LESS THAN (to_days('2006-09-26'))(

SUBPARTITION s1a

DATA DIRECTORY = '/FW_data2'

INDEX DIRECTORY = '/FW_indx2'

)

PARTITION p2 VALUES LESS THAN (to_days('2006-09-28'))(

SUBPARTITION s2a

DATA DIRECTORY = '/FW_data3'

INDEX DIRECTORY = '/FW_indx3'

)

);

-Mike G
--------------------
Mike,
How is this table being updated?

a) From one source like a batch job?
b) Or from hundreds of users concurrently?

If a), then why not just create 1 table per day (or 3 tables per day) and
when you want to reference (the entire day or) a week, just create a Merge
Table?
http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html

If b), then you need to use InnoDb tables because that has row locks
compared to MyISAM's table locks.

Mike
--------------------

We're using the Load infile function to load the data generated by another process. We do not do updates, but occasionally need to either walk the table or run a query against it. On Oracle, we currently need 3 hour partitions to keep the 5 indexes timely.

This system handles 450-750 Million inserted rows per day with 5 fields being indexed. This number will be closer to 2 Billion records / day by Spring 2007 we've been told.

For example, I diverted the full flow of data to MySQL for 15 minutes and inserted 9 Million records with a back up of loader files. I need to speed this up. Unfortunately, table structure and indexes are static and cannot be changed.

-Mike G
------------------

Mike,
I've done a lot of Load Data with large tables and as you no doubt discovered, as the number of rows in the table increases, the insert speed decreases. This is due to the extra effort involved in maintaining the index as the rows are being loaded. As the index grows in size, it takes

longer to maintain the index. This is true of any database. MyISAM tables are going to be faster than InnoDb in this case.

You can speed it up by:
1) Add as much memory as possible in the machine because building the index will be much faster if it has lots of ram.
2) Modify your My.Cnf file so key_buffer_size=1500M or more. (Assuming you have 3gb or more installed) This allocates memory for building the index.
3) If the table is empty before you add any rows to it, Load Data will run much faster because it will build the index *after* all rows have been loaded. But if you have as few as 1 row in the table before running Load

Data, the index will have to be maintained as the rows are inserted and this slows down the Load Data considerably.
4) Try throwing an exclusive lock on the table before loading the data.
I'm
not sure but this might help.
5) If your table already has rows in it before running Load Data, and the table has indexes defined, it is much faster if your disable the indexes to the table before running Load Data, and then enable the index after Load

Data has completed. See "Alter Table Enable/Disable Indexes" for more info.
6) If you are using Alter Table to add indexes after the table has data,

make sure you are adding all indexes in one Alter Table statement because MySQL will copy the table each time the Alter Table is run.

If you are going to be adding 2 billion rows per day, you might want to try
1 table per hour which will reduce the number of rows to < 100 million which may be more manageable (assuming 24 hour day). You can then create a merge table on the 24 rows so you can traverse them. You can of course create a merge table just for the morning hours, afternoon hours, evening hours etc.. Name each table like: 20060925_1400 for 4PM on 9/25/2006. Of

course you may also want to summarize this data into a table so you don't need all of this raw data lying around.

-Mike
-------------------------------------------------

Loading 500,000 rows with 200M rows in the DB with Indexes on takes 22 Minutes.

Loading 500,000 rows with 200M rows in the DB with indexes turned off and then build indexes after the load took over 75 minutes. This would probably work if we only inserted 40-80 million rows a day total, or had a few hours where data was not being inserted.

Daily partitions are created then sub partitioned across 6 data disks and 6 index disks.

We attempted to build a new table per hour, and merge them after 3 hours. We killed the processes after 2 hours. 1 hour of data is approx 18GB. The server only has 12GB of RAM.

I wish we could partition down to TO_HOUR instead of TO_DAY

[EDIT] - We already summarize, but re need the raw data for legal reasons

-Mike G

----------------

There's some discussion of this issue on the Partitioning Forum -
http://forums.mysql.com/list.php?106 - and you're more likely to get
topic-specific attention there from users and MySQL developers working
with partitioning than you are here on the General list.

Also, have you checked out the recent articles on partitioning available
from our DevZone? These include:

http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitioning_with_dates.html

http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html

- both of which discuss date-based partitioning techniques that you
might find useful.

cheers

jon.

------------------

Our tests used the tech articles as starting points.

End Report:

Our first test was partitioned to the day and sub partitioned to the day as well. This started fine (with an empty table) but performance quickly degraded. MySQL’s Optimizer engine finest granularity is daily. We can create fake partitions smaller then 1 day, but we then loose the benefit of partitions as any action actually takes place across all partitions. SELinux interrupted the MySQL Process and corrupted the table.

The next test was daily partitions with fake hourly sub partitions (created by turning the date and hour of the day into an integer)..

This was a bit better. The first hundred Million records were loaded in just over 2 hours, but from 100 to 200 Million took an additional 12 hours

Next, we partitioned by our fake hour as above, then sub partitioned using the same method using by minute.

The data was loaded at about the same speed. (I didn’t start loading until about 6:30 pm or so)

We next tried to convert all of the varchar field to char fields. We’d sacrifice disk space for increased speed.

This appears to be slower then the first Daily partition scheme.

Only 58 Million recs in the first 2 hours.

We’ve asked on the MySQL Forums and on the MySQL Mailing list for the best way to partition the DB and load mass records into it.

The bottom line is MySQL does not appear to be able to keep up with the indexes at this data volume. Possible to revisit MySQL once partitioning TO_HOUR is available. Using sub partitioning and the MyISAM engine may have extreme performance gains over Oracle, but only after the partitioning issues are addressed.

We can and will continue to use MySQL at our lower volume (100-300M rec/day) sites.

-Mike

--------------

If anyone can add anything further, or knows when hourly partitioning will be available I'd love to hear about it.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Inserting at Extreme Volumes

6250

Michael Gargiullo

October 11, 2006 08:44PM

Re: Inserting at Extreme Volumes

3009

Jonathan Stephens

October 15, 2006 12:02AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.