I'd like to count distinct over two columns.
In imaginary sql, I'd write the below:
COUNT(DISTINCT col1, col2)
In T-SQL, this is done by having one grouping in a sub query and then a second grouping in the outer query - like this
As far as I know, USQL doesn't have sub queries, so I'm a bit stuck trying to work out how to do this. I suppose I could use two aliased queries, but I was wondering if there was a better way.
U-SQL does support subqueries. What it does not (yet) support are scalar subqueries and correlated subqueries (use a join for both instead).
So the solution
SELECT COUNT(*) AS count
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
or the mentioned SUM(CASE)
with GROUP BY
in the link above both work.
Alternatively, U-SQL also allows you to name the inner query and use it in the outer statement which makes the script easier to read for more complex cases (it is similar to a common-table expression in that it does only name the query and does not execute it):
@inner =
SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems;
@result = SELECT COUNT(*) AS count FROM @inner;