I have a table in SQL Server with two columns as shown here, all number are comma-separated:
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
DROP TABLE #TempTable
CREATE TABLE #TempTable
(
[Sample] varchar(33),
Benchmark varchar(33)
)
INSERT INTO #TempTable VALUES ('9,55,66', '55,44');
INSERT INTO #TempTable VALUES ('88,23,2,3,4', '23,88');
I need to match the data in Sample
column with the Benchmark
, then calculate the missing number ratio:
Let me explain:
Sample
and Benchmark
(2-1)/2 = 50%
of them are missingI don't need column C, D ,E to be stored in the table, just a explanation purpose. WHAT I need is the column F.
I am using SQL Server 2019+.
First things first, you're going to need a way to uniquely identify each of your rows. I could make the assumption that sample
uniquely identifies the rows, but I'm sort of guessing there might be something like an identity
on the table. if there isn't one, make one using row_number()
. I simply added an identity called Rid
to your temp table.
The reason you'll need this is you're going to split the samples and the benchmarks into two CTEs, call string_split
on each of them, and then join them back together on the split value and the row identifier (in my example, RID
, but again, ifsample
is a valid key, that works too.
The samples
CTE is going to look like this:
select RID, Value = trim(b.value)
from #tmp a
cross apply string_split(a.sample, ',') b
and the benchmarks
CTE is going to look like this:
select RID, Value = trim(b.value)
from #tmp a
cross apply string_split(a.benchmark, ',') b
I'm showing you these separately so you can see what each CTE contains. I'm also trimming the value in case there is whitespace in there. You could go a step further and cast them to int
if you want/need, but I'll leave that as an exercise for the reader.
Finally, left join everything back together on RID
and Value
, grouping by RID
(i.e. the original row you were testing). The Benchmark count is just the total count of rows (or you could use bm.value
; it's the same thing since benchmarks
is your left table here) and the match count is the count of s.value
. This works, because if there isn't a match, you'll get a null, and count(s.value)
will skip any s.value
where it's null.
;with samples as
(
select RID, Value = trim(b.value)
from #tmp a
cross apply string_split(a.sample, ',') b
), benchmarks as
(
select RID, Value = trim(b.value)
from #tmp a
cross apply string_split(a.benchmark, ',') b
)
select
bm.RID,
BenchmarkCount = count(1),
MatchCount = count(s.value),
MissingRatio = 100 - convert(decimal(9,3), (100.0 * count(s.Value)) / nullif(count(1), 0))
from benchmarks bm
left outer join samples s
on bm.RID = s.RID
and bm.Value = s.Value
group by bm.RID
From there, you can format your results however you like.