I am trying to find out how I can see in my table where the same set of numbers appears.
Eg. I have a table with 10 columns numbered 1 -10. each column has a different number from 1 - 10.
I want to see if there is the same number set in any rows.
Eg.
Row 1 : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Row 2 : 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
These are different but they have the same numbers in them so I want to remove any number sets that are similar in this regard.
You can you can use a combination of CROSS APPLY
, VALUES
, and STRING_AGG
to unpivot and aggregate the values into a single ordered comma-separated list that uniquely defines the set.
From there, you can use a COUNT(*)
window function to count the duplicates, ROW_NUMBER()
to number the duplicates, and even DENSE_RANK()
to number the distinct sets. One or more of these values can be used to filter the results as desired.
WITH CombinedValues AS (
SELECT
D.*,
VL.ValueList,
DENSE_RANK() OVER(ORDER BY VL.ValueList) AS SetNumber,
COUNT(*) OVER(PARTITION BY VL.ValueList) AS DupCount,
ROW_NUMBER() OVER(PARTITION BY VL.ValueList ORDER BY D.Id) AS RowNum
FROM Data D
CROSS APPLY (
SELECT STRING_AGG(V.Value, ',') WITHIN GROUP(ORDER BY V.Value) AS ValueList
FROM (
VALUES
(Col1), (Col2), (Col3), (Col4), (Col5),
(Col6), (Col7), (Col8), (Col9), (Col10)
) V(Value)
) VL
)
SELECT *
FROM CombinedValues CV
--WHERE CV.DupCount >= 2 -- Uncomment to only show the duplicates
--WHERE CV.RowNum = 1 -- Uncomment to only show The first of each distinct set
ORDER BY ValueList
Sample results (with some extra test data):
Id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10 | ValueList | Set Number |
Dup Count |
Row Num |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 1,2,3,4,5,6,7,8,9,10 | 1 | 2 | 1 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 1 | 1,2,3,4,5,6,7,8,9,10 | 1 | 2 | 2 |
5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 6 | 5,5,5,5,5,5,5,5,5,6 | 2 | 1 | 1 |
3 | 5 | 6 | 5 | 6 | 5 | 6 | 5 | 6 | 5 | 6 | 5,5,5,5,5,6,6,6,6,6 | 3 | 2 | 1 |
4 | 5 | 5 | 5 | 5 | 5 | 6 | 6 | 6 | 6 | 6 | 5,5,5,5,5,6,6,6,6,6 | 3 | 2 | 2 |
If duplicate values are present within a set, two sets must have the same number of each value to be considered a match. If you only care about distinct values, you can modify the above query with:
...
CROSS APPLY (
-- Distinct values
SELECT STRING_AGG(DV.Value, ',') WITHIN GROUP(ORDER BY DV.Value) AS ValueList
FROM (
SELECT DISTINCT Value
FROM (
VALUES
(Col1), (Col2), (Col3), (Col4), (Col5),
(Col6), (Col7), (Col8), (Col9), (Col10)
) V(Value)
) DV
) VL
...
Results:
Id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10 | ValueList | Set Number |
Dup Count |
Row Num |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 1,2,3,4,5,6,7,8,9,10 | 1 | 2 | 1 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 1 | 1,2,3,4,5,6,7,8,9,10 | 1 | 2 | 2 |
3 | 5 | 6 | 5 | 6 | 5 | 6 | 5 | 6 | 5 | 6 | 5,6 | 2 | 3 | 1 |
4 | 5 | 5 | 5 | 5 | 5 | 6 | 6 | 6 | 6 | 6 | 5,6 | 2 | 3 | 2 |
5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 6 | 5,6 | 2 | 3 | 3 |
See this db<>fiddle for a demo.