There's a technique in Oracle SQL that can be used to simplify aggregation queries.
Scenario: We've aggregated on a particular column — but we actually want to know information from a different column.
There are a number of ways to achieve that using SQL. I'm looking for a solution that lets me do it in a calculated column -- all within a single SELECT query (no subqueries, joins, WITH, etc.).
Like this:
--Oracle
--For a given country, what city has the highest population? (where the country has more than one city)
--Include the city name as a column.
select
country,
count(*),
max(population),
any_value(city)
keep (dense_rank first order by population desc)
from
cities
group by
country
having
count(*) > 1
As shown above, the following column can bring in the city name, even though the city name isn't in the GROUP BY:
any_value(city) keep (dense_rank first order by population desc)
Is there equivalent functionality in SQL Server?
Related:
Edit:
I changed MAX()
to ANY_VALUE()
, since I think ANY_VALUE()
is easier to read.
Ties can be broken by adding , city desc
to the order by
, making it deterministic:
any_value(city) keep (dense_rank first order by population desc, city desc)
SQL Server does not implement this.
I'm looking for a solution that lets me do it in a calculated column -- all within a single SELECT query (no subqueries, joins, WITH, etc.)
It's possible but not pretty (DB Fiddle).
Assuming population is a positive integer then you could use the below (See Solution Based on Concatenation for the idea behind this approach)
select
country,
count(*),
max(population),
SUBSTRING(MAX(FORMAT(population, 'D10') + city), 11, 8000)
from
cities
group by
country
having
count(*) > 1