sql sql-server t-sql count window-functions

List or concatenate in SQL window function

A (relatively new to SQL) member of my team was working on writing an SQL query that happened to use a window function. Upon reviewing I noted that they structured their window function like this:

COUNT(*) OVER(PARTITION BY Part1+Part2) AS A

Which I instantly made a feedback note to say it should be like this:

COUNT(*) OVER(PARTITION BY Part1, Part2) AS A

Both Part1 and Part2 are nvarchars.

Then I paused to reflect and I couldn't actually work out why that would be wrong. As far as I can see that would actually produce identical results (it does). The actual execution plan is nearly identical aside from an extra Compute Scalar step after the initial table scan on the first query (this is 0% of query cost). The I/O statistics show that the first version has 5 fewer logical reads (12,665 to 12,670).

So is there any benefit/detriment to using either form, aside from coding conventions? Is it a case that this works fine in this instance, but in certain circumstances could produce inconsistent results?

Solution

Both expressions are valid, but they do not do the same thing.

Consider the following data:

Part1    Part2
AB       C
A        BC

When concatenating strings with PARTITION BY Part1+Part2 both records fall in the same partition, whereas when using PARTITION BY Part1, Part2, they would belong to different partitions.

So the question actually comes down to: what is the correct partitioning criteria for your use case? Usually, unless you are doing something fancy, you want PARTITION BY Part1, Part2. But this actually has to be answered from functional perspective, based on your real use case.