Replacing a subset of records in a 8.5M row table, with performance?
I have a table with 8.5M rows. Every week I receive in files the latest set of these records, which I want to reconcile against the existing records. I.e. remove those that do not exist in the file and add those that do not exist in the table.
I tried to use the INSERT ...... ON DUPLICATE UPDATE method, but each file has about 60,000 records and takes about 30 minutes to complete. Since there are about 104 of these then it would take 52 hours to complete.
So I thought I was going to fix it by loading each files' worth of records into a HEAP table (takes about 40 seconds). Then DELETE all those rows that match on the grouping key in the table with 8.5M rows then insert the new records from the HEAP table into the tables with the 8.5M rows. But the DELETE on the 60,000 reocrds in the first files is still running after 40 minutes, so it is proving slower.
Here is the temp table DDL:
CREATE TABLE IF NOT EXISTS FI_CROSS_REF_DATA_IDENTIFIERS_TEMP
(ASSET_ID BIGINT NOT NULL, ID_CD CHAR(3) NOT NULL, ID_NUMBER VARCHAR(30) NULL, ID_EFFECTIVE_DATE DATE NOT NULL, KEY `U_Index` TYPE HASH (`ASSET_ID`,`ID_CD`,`ID_NUMBER`,`ID_EFFECTIVE_DATE`)) ENGINE = MEMORY
The index is there to remove duplicate records ( they are in the file).
Then after the fast INSERT I do the delete:
DELETE FROM FI_CROSS_REF_DATA_IDENTIFIERS
WHERE FI_CROSS_REF_DATA_IDENTIFIERS.ASSET_ID IN
(SELECT ASSET_ID FROM FI_CROSS_REF_DATA_IDENTIFIERS_TEMP)
Finally the INSERT of the new records
INSERT INTO FI_CROSS_REF_DATA_IDENTIFIERS
( ASSET_ID, ID_CD, ID_NUMBER, EFFECTIVE_DATE )
SELECT ASSET_ID, ID_CD, ID_NUMBER, ID_EFFECTIVE_DATE
FROM FI_CROSS_REF_DATA_IDENTIFIERS_TEMP
The tables being loaded cannot afford to be off line. It needs Select only access from a front end
Suggested approaches?
thx.
David