MySQL Forums
Forum List  »  InnoDB

Re: Performance Problems for large inserts into INNODB table via LOAD DATA
Posted by: Michael Vitale
Date: February 28, 2011 09:01AM

> PRIMARY KEY (TKT_ID), UNIQUE KEY iux_TicketsTKT_ID(TKT_ID),
You said: A PRIMARY KEY is a UNIQUE KEY. So the second is totally redundant; recommend DROPing it. Having it significantly slows down the load.
My response: confused about this one, are you implying that MySQL created 2 unique keys: one for the index and one for the primary key?

You said: TicketAttachments has no _explicit_ PRIMARY KEY. It is usually better to have one, even it it is just INT UNSIGNED NOT NULL AUTO_INCREMENT.
My response: There is a primary key defined on it, look again:
FILE_ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

You said: How big is the table?
My response: Actually I have multiple tables, one for each year, but the attachments table is for all of them.
I am not using foreig keys because my application enforces referential integrity.
I have attached the show table status as you requested. It is a CSV file.

Tickets2010
Tickets2011
TicketAttachments


> innodb_io_capacity=2000
You said: Are you using SSDs? Hardware RAID? If not, this value may be excessive. An ordinary drive cannot handle more than about 200.
My response: I am using software raid level 1.

> I do an explicit commit every 90 seconds.
You said, Does this mean that your Python script generates a LOAD DATA file of a few thousand rows, then closes it and issues a COMMIT?
My Response: I have thousands of LOAD DATA files (a previous program created them in MySQL format).
There is one file for the tickets table and it can have 1-20 rows.
1 file for attachments, which can only have one row.
So, every 90 seconds I issue a commit and a LOAD DATA has been issued during that time for hundreds of files.

You said:
> create_date VARCHAR(40) NULL,
> mod_date VARCHAR(40) NULL,
> read_date VARCHAR(40) NULL,
There is a much more compact, and usable, DATE datatype.
My Response: Actually the data for these is in various types of formats, so I treat it as unformatted data.

You said: What is your conclusion on Tablespaces, etc?
My response: I love controlling where this data resides and reducing the impact on ibdata.
I do not see any measurable performance difference whether I use tablespaces or not.
There is one problem though, the INFORMATION_SCHEMA does not track information for it in the TABLESPACES or FILES tables.
I created a separate bug for this.

You said: A significant issue in the performance of loading Tickets is whether TKT_ID values are ordered or random.
My response: the data is ordered. TKT_ID acts like an auto-generated number but it is managed by my application.

You said: Do you find that performance starts out fast, and later slows down? Some other pattern?
Yes, I see performance starts out fast (1000-5000 rows inserted per second), then slows down after about an hour (100-500 rows per second).

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.