Search code examples
sqlsql-servert-sqlsplitapply

String_Split multiple columns


I need to split values of two columns in a table with separated values.

My source table looks like:

product_id ean_upc is_approved
2102961 8710103368021;8710103368038;8710103368014 1;0;1
92923683 0195161923722;195161923722;10195161923729 1;1;0
656364 8710103203919;8710103239093;8710103203902 1;1;1
735699 8710103285885;8710103295938;8710103295952 0;0;1

What I need is a select statement that give me back each product_id and EAN code

SELECT 
    product_id, EAN.value
FROM
    IceCatUK.dbo.ice_indexCSV_tmp
CROSS APPLY
    STRING_SPLIT(ean_upc , ';') EAN

The problem is that I need to have the is_approved information too.

My final table needs to have Product_id, the separated EAN_upc and the is_approved flag.

The is_approved must be linked to the columns of the EAN

The result for the first line should look like this:

product_id ean_upc is_approved
2102961 8710103368021 1
2102961 8710103368038 0
2102961 8710103368014 1

How to link the position in the string_split(EAN) with the position of string_split("is_approved") ?


Solution

  • Similar to @lemon's excellent answer, but gathering both EAN and APPROVED into separate result sets and joining on both product_id and ordinal to avoid the potentially large intermediate cartesian result set that is produced by two joins of Cross-Apply. Depending on the size of your data, this may be more performant.

    SELECT ice_indexCSV_tmp.product_id, 
      EAN.value as ean_upc,
      dt.is_approved
    FROM ice_indexCSV_tmp
       CROSS APPLY STRING_SPLIT(ean_upc, ';', 1) EAN
       INNER JOIN 
           ( 
                SELECT ice.product_id, APPROVED.value as is_approved, APPROVED.ordinal 
                FROM ice_indexCSV_tmp ice 
                    CROSS APPLY STRING_SPLIT(is_approved, ';', 1) APPROVED 
          ) dt
         ON EAN.ordinal = dt.ordinal
            AND ice_indexCSV_tmp.product_id = dt.product_id
    

    dbfiddle version

    Note that the enable_ordinal parameter is only available in sql-server 2022 16.x or Azure databases. A terrible oversight that it wasn't included when string_split() was initially rolled out.

    The row_number route in @charleschen's answer is likely the next best option, but because there is no guarantee about the order of the values spit out by the cross-apply and string_split() it's a bit of a gamble. That being said, I've never seen SQL Server spit out string_split results in a different order than the string that was fed into it.

    ULTIMATELY, if I had an older version of sql-server where the enable_ordinal was not available, and this was production level code, I would probably look to pre-processing the source data that was used to populate this table before ingesting into sql server. You could make quick work of this in a pandas dataframe with pandas.Series.explode().