Search code examples
sqlgoogle-bigquerymappingsql-like

bigquery update table using LIKE returns "UPDATE/MERGE must match at most one source row for each target row"


I have two tables and want to update table1 (original data table) with data found in table2 (mapping table) using a LIKE statement. However, I always get the error message for all my different tries:

Query error: UPDATE/MERGE must match at most one source row for each target row

Table1 (data table)

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  
Bert says hello               | 
Want to find "Caesar"bdjehg   |

Table2 (mapping table)

mappingItem
------------
Adam
Bert
Caesar

Expected result

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Want to find "Caesar"bdjehg   |  Caesar

Queries:

UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
    WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');


UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
     WHERE INSTR(textWithFoundItemInIt , mt.mappingItem) >1;


UPDATE `table1`
SET foundItem = (SELECT mt.mappingItem FROM `table2` AS mt
WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%')
)
WHERE TRUE; 


UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `table1`
inner join  `table2` mt on textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');

I have also removed all duplicate values from both table 1 and table 2, but still same error message. I also tried to use a join statement, but I got this err msg: "Alias table1 in the FROM clause was already defined as the UPDATE target"

I found these similar questions in SO and tried to use their approaches:

Unfortunately they were not helpful to solve my problem. So I think this is not a duplicate question.

Thank you very much for your thoughts.


FOLLOW UP QUESTION

I am referring to the solution posted by @Jon. Once again thank you for your help. However, after testing it with different data, there is still the issue that it does not work if there are duplicates in 'table1'. Of course this problem comes from the 'GROUP BY' statement - and w/o this, the UPDATE query does not work, resulting in the error message stated in my original question. It doesn't work either, if I GROUP every value.

However, there can be duplicates in my 'table1' (Data) and my mapping table 'table2'. So to make it very precise, this is my goal:

Table1 (data table)

textWithFoundItemInIt         | foundItem
-------------------------------------------
hallo Adam                    |  
Bert says hello               | 
Bert says byebye              | 
Want to find "Caesar"bdjehg   |
Want to find "Caesar"bdjehg   |
Want to find "Caesar"again    |
Want to find "CaesarCaesar"again and again | <== This is no problem, just finding one Caesar is enough 

Table2 (mapping table)

mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam

Expected result

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

It doesn't matter which Adam from Table2 is found and inserted into Table1, they will be the same. So it is even okay if the first Adam will be over written by the second Adam, or if the query just stops to search any further once one Adam is found.

If I execute Jon's 'SELECT' query, it would result in:

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

It (correctly) omits the second "Want to find "Caesar"again", but that's unfortunately not what I need.

If it is easier, it would also be okay that in cases that two names are found in one row

textWithFoundItemInIt         | foundItem
---------------------------------------------
hallo Adam and Bert           |  Adam, Bert 
Bert says hello to Caesar     |  Bert, Caesar

or

textWithFoundItemInIt         | foundItem1      | foundItem2
---------------------------------------------------------------
hallo Adam and Bert           |  Adam           | Bert 
Bert says hello to Caesar     |  Bert           | Caesar

I hope this helps to understand my issue. In easy words: "It's just a mapping with multiple equal rows" ;-)

Thanks a lot :)


Solution

  • Your logic does not guard against this case:

    mappingItem
    -----------
    item1
    item12
    

    Since pattern %item1% will match both item1 and item12. There are a bunch of ways to avoid that, which depend on how you wish to handle these issues in poorly structured data. But this is the reason.

    You can look for problems with something like this:

    SELECT table1.textWithFoundItemInIt
         , COUNT(*)
      FROM table1
      JOIN table2
        ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
     GROUP BY table1.textWithFoundItemInIt 
    HAVING COUNT(*) > 1
    

    Once you decide how to handle these cases, you should be able to choose which, of the matching choices, to use in the UPDATE.

    Basically, make sure the logic limits the list of values to assign (per table1 row) to one (1) value.

    Here's one approach. I'm not sure bigquery supports this specific form. But it shows a logical approach.

    Review the data, noting we have a case where more than one mappingItem matches a table1 row:

    SELECT table1.textWithFoundItemInIt
         , COUNT(*)
         , MIN(table2.mappingItem) AS theItem1
         , MAX(table2.mappingItem) AS theItem2
      FROM table1
      JOIN table2
        ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
     GROUP BY table1.textWithFoundItemInIt 
    HAVING COUNT(*) > 1
    ;
    
    +-----------------------+----------+----------+----------+
    | textWithFoundItemInIt | COUNT(*) | theItem1 | theItem2 |
    +-----------------------+----------+----------+----------+
    | Item12 is a problem   |        2 | item1    | item12   |
    +-----------------------+----------+----------+----------+
    

    Now adjust the UPDATE to pick the MIN(mappingItem) per table1 row when assigning the new value:

    UPDATE table1
      JOIN ( SELECT textWithFoundItemInIt
                  , MIN(mappingItem) AS mappingItem
               FROM table1
               JOIN table2
                 ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
              GROUP BY table1.textWithFoundItemInIt 
           ) mt
        ON table1.textWithFoundItemInIt = mt.textWithFoundItemInIt 
       SET foundItem = mt.mappingItem
    ;
    

    Review the result:

    SELECT * FROM table1;
    
    +----------------------------+-----------+
    | textWithFoundItemInIt      | foundItem |
    +----------------------------+-----------+
    | hallo Item1                | item1     |
    | Item2 says hello           | item2     |
    | Item12 is a problem        | item1     |
    | Want to find "Item3"bdjehg | item3     |
    +----------------------------+-----------+
    

    Note: This updates all target rows, even the problem rows, based on the original request. This can be adjusted to only touch those rows which don't already have a foundItem set, WHERE foundItem IS NULL.