Search code examples
sqljoinstring-matching

SQL Joining on Field with Nulls


I'm trying to match two tables where one of the tables stores multiple values as a string.

In the example below I need to classify each product ordered from the #Orders table with a #NewProduct.NewProductId.

The issue I'm having is sometimes we launch a new product like "Black Shirt", then later we launch an adaption to that product like "Black Shirt Vneck".

I need to match both changes correctly to the #Orders table. So if the order has Black and Shirt, but not Vneck, it's considered a "Black Shirt", but if the order has Black and Shirt and Vneck, it's considered a "Black Vneck Shirt."

The code below is an example - the current logic I'm using returns duplicates with the Left Join. Also, assume we can modify the format of #NewProducts but not #Orders.

IF              OBJECT_ID('tempdb.dbo.#NewProducts') IS NOT NULL DROP TABLE #NewProducts
CREATE TABLE    #NewProducts 
(
      ProductType VARCHAR(MAX)
    , Attribute_1 VARCHAR(MAX)
    , Attribute_2 VARCHAR(MAX)
    , NewProductId INT
)

INSERT      #NewProducts
VALUES
    ('shirt', 'black', 'NULL', 1),
    ('shirt', 'black', 'vneck', 2),
    ('shirt',  'white', 'NULL', 3)


IF              OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE    #Orders
(
      OrderId INT
    , ProductType VARCHAR(MAX)
    , Attributes    VARCHAR(MAX)
)

INSERT  #Orders
VALUES
    (1, 'shirt', 'black small circleneck'),
    (2, 'shirt', 'black large circleneck'),
    (3, 'shirt', 'black small vneck'),
    (4, 'shirt', 'black small vneck'),
    (5, 'shirt', 'white large circleneck'),
    (6, 'shirt', 'white small vneck')

SELECT      *
FROM        #Orders o
        LEFT JOIN #NewProducts np
            ON o.ProductType = np.ProductType
            AND CHARINDEX(np.Attribute_1, o.Attributes) > 0
            AND (
                    CHARINDEX(np.Attribute_2, o.Attributes) > 0
                OR np.Attribute_2 = 'NULL'
                )

Solution

  • You seem to want the longest overlap:

    SELECT *
    FROM #Orders o OUTER APPLY
         (SELECT Top (1) np.*
          FROM #NewProducts np
          WHERE o.ProductType = np.ProductType AND
                CHARINDEX(np.Attribute_1, o.Attributes) > 0
          ORDER BY ((CASE WHEN CHARINDEX(np.Attribute_1, o.Attributes) > 0 THEN 1 ELSE 0 END) +
                    (CASE WHEN CHARINDEX(np.Attribute_2, o.Attributes) > 0 THEN 1 ELSE 0 END)
                   ) DESC
         ) np;
    

    I can't say I'm thrilled with the need to do this. It seems like the Orders should contain numeric ids that reference the actual product. However, I can see how something like this is sometimes necessary.