I need to split values of two columns in a table with separated values.
My source table looks like:
product_id | ean_upc | is_approved |
---|---|---|
2102961 | 8710103368021;8710103368038;8710103368014 | 1;0;1 |
92923683 | 0195161923722;195161923722;10195161923729 | 1;1;0 |
656364 | 8710103203919;8710103239093;8710103203902 | 1;1;1 |
735699 | 8710103285885;8710103295938;8710103295952 | 0;0;1 |
What I need is a select statement that give me back each product_id
and EAN
code
SELECT
product_id, EAN.value
FROM
IceCatUK.dbo.ice_indexCSV_tmp
CROSS APPLY
STRING_SPLIT(ean_upc , ';') EAN
The problem is that I need to have the is_approved
information too.
My final table needs to have Product_id
, the separated EAN_upc
and the is_approved
flag.
The is_approved
must be linked to the columns of the EAN
The result for the first line should look like this:
product_id | ean_upc | is_approved |
---|---|---|
2102961 | 8710103368021 | 1 |
2102961 | 8710103368038 | 0 |
2102961 | 8710103368014 | 1 |
How to link the position in the string_split(EAN)
with the position of string_split("is_approved")
?
Similar to @lemon's excellent answer, but gathering both EAN and APPROVED into separate result sets and joining on both product_id
and ordinal
to avoid the potentially large intermediate cartesian result set that is produced by two joins of Cross-Apply. Depending on the size of your data, this may be more performant.
SELECT ice_indexCSV_tmp.product_id,
EAN.value as ean_upc,
dt.is_approved
FROM ice_indexCSV_tmp
CROSS APPLY STRING_SPLIT(ean_upc, ';', 1) EAN
INNER JOIN
(
SELECT ice.product_id, APPROVED.value as is_approved, APPROVED.ordinal
FROM ice_indexCSV_tmp ice
CROSS APPLY STRING_SPLIT(is_approved, ';', 1) APPROVED
) dt
ON EAN.ordinal = dt.ordinal
AND ice_indexCSV_tmp.product_id = dt.product_id
Note that the enable_ordinal
parameter is only available in sql-server 2022 16.x or Azure databases. A terrible oversight that it wasn't included when string_split()
was initially rolled out.
The row_number
route in @charleschen's answer is likely the next best option, but because there is no guarantee about the order of the values spit out by the cross-apply and string_split()
it's a bit of a gamble. That being said, I've never seen SQL Server spit out string_split
results in a different order than the string that was fed into it.
ULTIMATELY, if I had an older version of sql-server where the enable_ordinal
was not available, and this was production level code, I would probably look to pre-processing the source data that was used to populate this table before ingesting into sql server. You could make quick work of this in a pandas dataframe with pandas.Series.explode()
.