Search code examples
sqlpostgresqlunionpostgresql-performance

Postgres UPDATE x WHERE id IN y


This has probably been asked before, but googling for keywords like "IN" doesn't work very well.

This is my query:

UPDATE tblCustomer SET type = 2 
WHERE idcustomer                                
IN (SELECT fidcustomer1
    FROM tblorder                   
     UNION                      
    SELECT fidcustomer2
    FROM tblorder                   
   )                                

To break it down: I want to set the type (just an int) of all customers to 2 for all customers that appear in the order-table, in one of either column.

On my test data, none of these tables contain more than a few hundred rows, yet the query runs for many minutes (even without the UNION, that doesn't seem to make much of a difference), apparently re-doing the inner query once per row in customer. I could obviously rewrite it into a single SELECT DISTINCT(id), followed by a few hundred single-row updates and do the logic in whatever language I use for my ODBC-access, but that's just a hack.

How can I rewrite this properly?

Addendum: The table I want to update contains a lot of relatively large BYTEA blobs, a few MB per row. They are set to Storage External or Extended, but I wonder if that could make the sequential scans slow. All updates seem to take a long time, not just this one.


Solution

  • -------------------------------
    -- Use two EXISTS:
    -------------------------------
    UPDATE tblCustomer tc
    SET type = 2
    WHERE EXISTS (
        SELECT *
        FROM tblorder ex
        WHERE ex.fidcustomer1 = tc.idcustomer
        )
    OR EXISTS (
        SELECT *
        FROM tblorder ex
        WHERE ex.fidcustomer2 = tc.idcustomer
        );
    
    -------------------------------
    -- or combine the two EXISTS::
    -------------------------------
    UPDATE tblCustomer tc
    SET type = 2 
    WHERE EXISTS (
        SELECT *
        FROM tblorder ex
        WHERE ex.fidcustomer1 = tc.idcustomer
        OR ex.fidcustomer2 = tc.idcustomer
        );
    

    My gut feeling is that the first version (with two separate exists) will perform better, because the executor could short-circuit if one of the existss would yield True. That would avoid the removal-of-duplicates phase (and probably sorting), which is inherent to the UNION construct.