MySQL Forums
Forum List  »  Newbie

Getting duplicate inserts
Posted by: Andrew Hodkinson
Date: September 30, 2015 03:09PM

Sorry, this is going to be a long one...

I am currently importing a product list into a landing table called XMLImport. Within this table there are roughly 23,000 products, some 3000 of which I am referring to as master products, the remaining are more like products options. For example, if there was a single shirt in 3 different sizes, and 3 different colours, there would be a total of 10 products in the table. The products can be grouped by the `name` field, and the master product can be identified as the one without a `size` and `colour`. Here is an example...

+------------+----------------------+-------------------+------------+------------------------------------------+
| product_id | name | size | colour | sku |
+------------+----------------------+-------------------+------------+------------------------------------------+
| 1 | Collared Check Shirt | | | R-BOX1-8453 |
| 2 | Collared Check Shirt | SM - UK (8-10) | Grey | 8453-Grey-SM - UK (8-10) |
| 3 | Collared Check Shirt | L/XL - UK (12-14) | Grey | 8453-Grey-L/XL - UK (12-14) |
| 4 | Collared Check Shirt | SM - UK (8-10) | Red | 8453-Red-SM - UK (8-10) |
| 5 | Collared Check Shirt | L/XL - UK (12-14) | Red | 8453-Red-L/XL - UK (12-14) |
| 6 | Collared Check Shirt | SM - UK (8-10) | Brown | 8453-Brown-SM - UK (8-10) |
| 7 | Collared Check Shirt | L/XL - UK (12-14) | Brown | 8453-Brown-L/XL - UK (12-14) |
+------------+----------------------+-------------------+------------+------------------------------------------+

I then take all the master products and move them into ActiveProducts using this query...

INSERT INTO ActiveProducts (name, type, size, colour, sku, weight, instock, quantity, price, image1, image2, image3, image4, image5, image6, description, dateadded)
SELECT name, type, size, colour, sku, weight, instock, quantity, price, image1, image2, image3, image4, image5, image6, description, NOW()
FROM XMLImport
WHERE size = '' AND colour = ''
AND sku NOT IN (SELECT sku FROM ActiveProducts)

Next, I then take all product options and move them into ActiveProducts, whilst linking them back to the master product defined as `parent_id`, as you'll see here...

INSERT INTO ActiveProducts (parent_id, name, type, size, colour, sku, weight, instock, quantity, price, image1, image2, image3, image4, image5, image6, description, dateadded)
SELECT ap.product_id, xml.name, xml.type, xml.size, xml.colour, xml.sku, xml.weight, xml.instock, xml.quantity, xml.price, xml.image1, xml.image2, xml.image3,
xml.image4, xml.image5, xml.image6, xml.description, NOW()
FROM XMLImport xml
JOIN ActiveProducts ap ON ap.name = xml.name
WHERE (xml.size != '' OR xml.colour != '')
AND xml.sku NOT IN (SELECT sku FROM ActiveProducts)

I eventually end up something like this in the ActiveProducts table...

+------------+-----------+----------------------+-------------------+------------+------------------------------------------+
| product_id | parent_id | name | size | colour | sku |
+------------+-----------+----------------------+-------------------+------------+------------------------------------------+
| 1 | NULL | Collared Check Shirt | | | R-BOX1-8453 |
| 2 | 1 | Collared Check Shirt | SM - UK (8-10) | Grey | 8453-Grey-SM - UK (8-10) |
| 3 | 1 | Collared Check Shirt | L/XL - UK (12-14) | Grey | 8453-Grey-L/XL - UK (12-14) |
| 4 | 1 | Collared Check Shirt | SM - UK (8-10) | Red | 8453-Red-SM - UK (8-10) |
| 5 | 1 | Collared Check Shirt | L/XL - UK (12-14) | Red | 8453-Red-L/XL - UK (12-14) |
| 6 | 1 | Collared Check Shirt | SM - UK (8-10) | Brown | 8453-Brown-SM - UK (8-10) |
| 7 | 1 | Collared Check Shirt | L/XL - UK (12-14) | Brown | 8453-Brown-L/XL - UK (12-14) |
+------------+-----------+----------------------+-------------------+------------+------------------------------------------+

So far so good, but this is where it all goes wrong. Let's say 2 more product options become available and go in the XMLImport table, like this...

+------------+----------------------+-------------------+------------+------------------------------------------+
| product_id | name | size | colour | sku |
+------------+----------------------+-------------------+------------+------------------------------------------+
| 1 | Collared Check Shirt | | | R-BOX1-8453 |
| 2 | Collared Check Shirt | SM - UK (8-10) | Grey | 8453-Grey-SM - UK (8-10) |
| 3 | Collared Check Shirt | L/XL - UK (12-14) | Grey | 8453-Grey-L/XL - UK (12-14) |
| 4 | Collared Check Shirt | SM - UK (8-10) | Red | 8453-Red-SM - UK (8-10) |
| 5 | Collared Check Shirt | L/XL - UK (12-14) | Red | 8453-Red-L/XL - UK (12-14) |
| 6 | Collared Check Shirt | SM - UK (8-10) | Brown | 8453-Brown-SM - UK (8-10) |
| 7 | Collared Check Shirt | L/XL - UK (12-14) | Brown | 8453-Brown-L/XL - UK (12-14) |
| 8 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 9 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
+------------+----------------------+-------------------+------------+------------------------------------------+

When I run the script again, hoping the 2 new product options will get added, it duplicates the new product options by the number of existing product options. For example...

+------------+----------------------+-------------------+------------+------------------------------------------+
| product_id | name | size | colour | sku |
+------------+----------------------+-------------------+------------+------------------------------------------+
| 1 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 2 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 3 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 4 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 5 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 6 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 7 | Collared Check Shirt | SM - UK (8-10) | Green (Dk) | 8453-Green (Dk)-SM - UK (8-10) |
| 1 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 2 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 3 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 4 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 5 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 6 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
| 7 | Collared Check Shirt | L/XL - UK (12-14) | Green (Dk) | R-BOX1-8453-Green (Dk)-L/XL - UK (12-14) |
+------------+----------------------+-------------------+------------+------------------------------------------+

I could add a LIMIT 1 to the end of the script SELECT sub-query which would work perfectly, but then I would end up having to run the SQL, thousands of times to get the entire product list added. The other option would be to set a UNIQUE on the `sku` field which would reject to duplicate, but would cause error and unnecessary resources wasted.

Any ideas how I solve this?

Options: ReplyQuote


Subject
Written By
Posted
Getting duplicate inserts
September 30, 2015 03:09PM
September 30, 2015 03:37PM
September 30, 2015 05:14PM
October 01, 2015 09:42PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.