MySQL Forums
Forum List  »  PHP

Best Way to update multiple rows, in multiple tables, with different data, containing over 50 million rows of data
Posted by: Max Resnikoff
Date: January 12, 2020 11:58AM

I am trying to migrate customers to a new database structure.

I have written a PHP script which loops through every record and determines if that customer has duplicates in the database or not. If there are multiple occurrences, I update all other tables linked to each customer (Such as: purchases, addresses, likes, wishlist, notes, etc), and update their linked customerID to the latest version of that customer.
Then I delete any old/duplicated data which is now all linked to the newest record of the customer.
Essentially to have all of that customer's data which is spread across multiple records; all linked to 1 record instead of 10+.

The issue I am facing is that running this script takes so long that I get a timeout 502 when running it, even if I run the query for customers names starting with 'A'. Then change to 'B' and run the script, then 'C' and run the script etc... (which would take forever to do anyways).

The Script basics:
Get all customers and group by email.
Loop through each grouped email/customer.
If there are more than 1 records in the group, query; SELECT all versions of that customer (Order by newest first)
Loop through this data
If first loop, this record is the latest and greatest. (store the ID for later use)
If not first loop, update the customerID of 5 different tables to the latest customerID (stored from first loop)

As you can see there are 5 queries within 2 nested loop of data.
I know this is not ideal, but this script is a 1-time use (to migrate data to a new database) so I am not worried about long-term efficiency/issues.

So my question is:
How can I update 5 tables for each occurrence of a customer (could be 10 duplicates in the database)?

So for 1 actual customer we could see 10 duplicates, and 100+ rows linked in each of the 5 tables to update the primary key in.
Which would result in:
100,000 actual customers (which have been grouped by email)
1,000,000 including duplicates
50,000,000 rows to update the customerID to the newest customerID in secondary tables.

so that's a minimum of 50,000,000 queries which I am trying to execute...

I have searched online about;
Concatenating UPDATE queries and sending it as one. (MySQL still executes each query individually)
Using CASE if customerID='123' THEN '456
Using INSERT into a temp table and then run UPDATE table-to-table (impractical!)

None of these are making any noticeable/practical difference to the execution time...

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.