Search code examples
sqlpostgresqljoinsql-update

How does the FROM side of an UPDATE relate to the table targeted for UPDATE?


The following query (sourced from here Postgres SQL SELECT and UPDATE behaving differently)

update fromemailaddress
    set call = true 
    from email
    where email.fromemailaddress = fromemailaddress.fromemailaddress and
          LOWER(email.subject) ~ 'tester';

The way I read it is:

Line 1: update fromemailaddress

-- we tell the database that we are updating the fromemailaddress table

Line 2:        set call = true 

-- we tell the database that the field named "call" will be set to true

Line 3:        from email
Line 4:         where email.fromemailaddress = fromemailaddress.fromemailaddress and
Line 5:              LOWER(email.subject) ~ 'tester';

-- OK now things are getting fuzzy. What actually happens here? It appears that somehow the database takes the result of the query in lines 3 4 and 5, but how does that tell it which rows to update in the fromemailaddress table? What is the pseudocode? Is it something like:

for each row in (query from lines 3, 4, 5)
    set call=true?

I just can't see how the FROM side of the SQL update relates to the other side.

UPDATE:

Following the valuable links in @Erwin's answer below lead me to this information that gets to the core of what I was trying to understand:

http://www.postgresql.org/docs/current/interactive/sql-update.html

When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.

Because of this indeterminacy, referencing other tables only within sub-selects is safer, though often harder to read and slower than using a join.


Solution

  • The UPDATE query you display is exactly the same as:

    UPDATE fromemailaddress f
    SET    call = true 
    FROM  (
       SELECT fromemailaddress
       FROM   email 
       WHERE  subject ILIKE '%tester%'
       ) e
    WHERE  e.fromemailaddress = f.fromemailaddress;
    

    subject ILIKE '%tester%' is a faster equivalent for subject ~ 'tester'. Details for LIKE, ILIKE and regular expression matching (~) in the manual or in this related answer on dba.SE:

    And effectively the same as:

    UPDATE fromemailaddress f
    SET    call = true
    WHERE  EXISTS (
       SELECT FROM email e
       WHERE  e.fromemailaddress = f.fromemailaddress
       AND    e.subject ILIKE '%tester%'
       );
    

    Use this instead.

    If there should be multiple rows in table email with the same fromemailaddress matching a row in fromemailaddress, then this form only executes one update per row, unlike your unfortunate original.

    Don't be confused by the fact that fromemailaddress is used as column and as table name here.

    Read the manual carefully here and here. In particular this bit:

    from_list

    A list of table expressions, allowing columns from other tables to appear in the WHERE condition and the update expressions. This is similar to the list of tables that can be specified in the FROM Clause of a SELECT statement. Note that the target table must not appear in the from_list, unless you intend a self-join (in which case it must appear with an alias in the from_list).