Search code examples
mysqlsqlcsvlarge-data

Updating a large MySQL table without any key/index information


I think I know what the problem is, but I need some help going in the correct direction.

I've got a tables 'products', and I've also got several temp_product tables for various suppliers.

The goal here is to update several fields in the products table from the appropriate temp_product table. My product suppliers give me a CSV file with all of their inventory data. Originally I was just looping through the CSV and updating line by line, but this takes forever, so now I load it into a temporary table using LOAD DATA LOCAL INFILE.

The problem I am having, is the UPDATE queries take forever to run, and most of the time MySQL just completely crashes. I am hoping I can show you my table structure, and somebody could help me out with what kind of key/index setup would work best?

I've tried 2 different update query variations, but neither one is working.

UPDATE product AS p, temp_product AS t
    SET p.quantity = t.quantity,
        p.ean = t.inventory,
        p.cost = t.price,
        p.date_modified = NOW()
    WHERE p.sku = t.sku

-AND-

UPDATE temp_product AS t
    INNER JOIN product AS p ON p.sku = t.sku
    SET p.quantity = t.quantity,
        p.ean = t.inventory,
        p.cost = t.price,
        p.date_modified = NOW()

Here is the structure to my tables in question:

temp_product

sku varchar(15) PRI
status  varchar(2)
statusid    int(11)
quantity    int(11)
inventory   varchar(15)
ETA varchar(25)
ETA_Note    varchar(255)
price   double(10,2)

product

product_id  int(11) PRI
model   varchar(64)
sku varchar(64)
upc varchar(50)
ean varchar(50)
mpn varchar(64)
location    varchar(128)
quantity    int(4)
price   decimal(15,4)
cost    decimal(15,4)
status  tinyint(1)
date_added  datetime
date_modified   datetime

I have a feeling I could get this to work correctly if I had keys/indices set up correctly. The only thing I have set up now is the Primary Key, but those don't match up across all the tables. I'm pretty new to all this, so any help would be appreciated.

To make things even more complicated, I'm not sure if some of my suppliers use the same SKUs, so I would like to update the product table WHERE sku = sku and location = 'suppliername'.

Thanks for the help!

EDIT: Slimmed down the problem a little bit, originally had a product and supplier_product table to update, once I get the product table working I can probably take it from there.


Solution

  • First of all, could you run SHOW CREATE TABLE product; and SHOW CREATE TABLE temp_product; and paste the results? Also, how exactly large is your product table? (select count(1) from products can help)

    Regarding the keys: you need at least to add sku key to your product table.

    If sku is supposed to be a unique field, then you can do it with the following command:

    ALTER TABLE product ADD UNIQUE KEY sku(sku);
    

    If sku is NOT a unique field, then you can still add it as a key like that:

    ALTER TABLE product ADD KEY sku(sku);
    

    but in that case, this mean that for one record with a partcular sku from the temp_product table, you will update more than one record in your product table.

    Regarding the table size: even if the table is large (say several million rows), but it's OK to run queries that take a lot of time (for example, if you are the only one using this database) then after you have added the key, either of the variants should in principle work and take less time than what it takes now. Otherwise, you would be better off with doing the update in batches (e.g. 100, 500 or 1000 records at a time) and preferably with some script that might even wait a little between the updates. This is especially recommended if your database is a master database that replicates to slaves.