MySQL Forums
Forum List  »  General

Re: Updating 100 million record tables
Posted by: Rick James
Date: November 30, 2011 09:40AM

> 9. Growing chorus for splitting the tables. Have issues with cutoffs on that though. For instance: how to do? Absolutely going to have 10% or more cases where if cutoffs are by date, then will have to use multiple tables to find results as a widget made in Y1 may be shipped in Y2 or Y3 or even Y(n).

PARTITION is a way to semi-automatically split data into multiple "tables".
* What is the "purge" policy (if any)? That is, can you delete old data? PARTITION by RANGE leads to a good (and very efficient) way to do so via DROP PARTITION.
* If the data and the purging are date-oriented, how big a timeframe are we talking about? (I'm fishing for whether to PARTITION by day or some other unit.)

I understand your need to obfuscate the schema, but please use column names that give me clues, like `widget`, `widget_id`, `date_1`, `money_1`.

Why the concern about this UPDATE? I sounds like it is a one-time task? Or do you need to do it again and again? Are there time constraints on how fast it is done. (I am working on another large auditing task that involves hourly data coming in, and it needs to be added to the database and ready for queries, preferably much before the next hour's data arrives. Such constraints were a key factor on designing the schema and the processing.)

> 10. The loop: How do you set up a loop just using MySQL.
I would not. Well, you can do such in a Stored Procedure. But, I am not convinced that that is any better than writing the code in Perl/PHP/Java/VB/... If you are comfortable with PHP, use that. It can be run from the commandline or from a web page.

> c4 is a varchar
Maybe the only change is the first line:
$a = '';
I hope my pseudo-code is easy enough to turn into PHP. I used $ variables a la PHP/Perl (but unlike Java, etc).

> Added an auto increment primary key to tables is that helps.
Hmmm; (scratching my head). On the one hand, it would even out the clumps; on the other hand it would not be finishing c4-based clump.

> Great advise on not locking up the server.
MyISAM locks the whole table while it does anything to the table. That would be a long lock for millions of rows, but tolerable for 1000 rows. InnoDB using "row locking", but going through an entire multi-million row table can be invasive with InnoDB, too. The 1000-row clumps would be unnoticed. (Don't rush to InnoDB; the disk footprint would be twice as big.)

When you are in production, what other requirements will there be? I have touched on taming the UPDATE -- this was assuming others need to get in there and look at the data without being locked out for hours. (Such things freak out newbies -- they respond by rebooting.)

> New to most of this.
Well, this is relatively heavy-duty stuff. Of the tables described (SHOW TABLE STATUS) on these forums, 100M rows is at the 97th percentile.

Memory tuning (the first step in performance -- you issues are way beyond this):
http://mysql.rjweb.org/doc.php/memory

Big deletes (a common performance problem -- your task involves "update" instead of "delete", but some of the tips are informative since it talks about walking through big tables, PARTITIONing, etc):
http://mysql.rjweb.org/doc.php/deletebig

Options: ReplyQuote


Subject
Written By
Posted
November 26, 2011 02:58PM
Re: Updating 100 million record tables
November 30, 2011 09:40AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.