MySQL Forums
Forum List  »  General

Re: Insert Into ... Select, Delte Data
Posted by: Rick James
Date: September 07, 2014 11:57AM

> but would ask if anyone knows a query what operates like "cut" in the file system.

I don't see the need for such here.

> "Table_locks_waited"

Sounds like you are using MyISAM? InnoDB has much less contention on INSERTs. This may even eliminate the need for the tmp table(s). If you do keep them, the temp table(s) may as well be MyISAM with no indexes.

> the real table have more than 150 million rows

Let's see the SHOW CREATE TABLE. A common mistake is to have too many INDEXes, thereby slowing down INSERTs.

Is Replication involved? There are potential issues there.

> Now each application shall creat an own table b during runtime, in there the application "batch" its entrys and write them into the real table when table b size is over XXXX.

There are several potential issues here.
* Yes, if it is convenient, each app could do its own batching and inserting. By doing its own INSERT..SELECT, you would not need the 2-table method I brought up. However, the app would need to stall briefly while it does the batch insert.
* Rather than waiting for XXX rows, it may make more sense to do it at some logical break point in the program.
* In a case like this, I gather up the rows to be inserted in an array (in PHP) and constructed a multi-row INSERT to execute. This avoids the tmp table. (Inserting 100 rows in a single SQL is 10 times as fast as 100 individual inserts. Going beyond 100 rows is getting into "diminishing returns".)
* When multiple processes are batching inserts into an InnoDB table, sort the rows you are about to insert; this avoids a certain class of deadlocks. With INSERT...SELECT, this can be done by INSERT...SELECT ... ORDER BY ....
* When doing a multi-row INSERT, pause at a few megabytes worth of INSERT statement. This is more important than XXX rows (but could be quantified as XXX rows if the size of the rows is reasonably consistent). This avoids overflowing net_buffer_size (or some other limit).

Another issue with batching... What happens when the system crashes? Will you lose some of the rows that are being staged in the tmp tables or waiting in an array in PHP? Does this matter in the big picture? (It might not if this is transient data and you can afford a small blip in the graphs, etc.) Sometimes the answer lies in the ultimate source of the data -- Is the data being generated from something that will survive the crash, such that running the Java threads again will "pick up where they left off"?

If you cannot tolerate data loss, then you need to consider using InnoDB even for tmp tables, and use transactions in most steps.

Will the 'real' data eventually be purged? Do you have PARTITIONing in place to turn the DELETEs into DROP PARTITION? (Otherwise, DELETEing a fraction of 150M rows will kill you.)

Sorry to throw so many thoughts at you. However, 150M rows requires lots of thinking, far more than for 150K rows.

Options: ReplyQuote


Subject
Written By
Posted
Re: Insert Into ... Select, Delte Data
September 07, 2014 11:57AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.