sql sql-server join aggregate-functions sql-server-2017

SQL Server join on result of aggregate function

SQL Server 2017. I have 2 tables, with hacker name and id, and another coding challenges submitted by each (below). I need to output id, name and count of challenges, filtering out those hackers who submitted the same number of challenges, except when this number is a maximum.
Here is sample data and final output i need

Hackers:

hacker_id name
1 john
2 tom
3 anna
4 mary
5 steve

Challenges:

challenge_id    hacker_id
1   1
2   1
3   1
4   2
5   2
6   2
7   2
8   3
9   3
10  3
11  4
12  4
13  4
14  4
15  5
16  5

this is the count of challenges per person (from here, we see that max number is 4 per person) :

hacker_id   name    count of challenges
1   john    3
2   tom     4
3   anna    3
4   mary    4
5   steve   2

The final output would be as follows:

hacker_id   name    count of challenges
2   tom     4
4   mary    4
5   steve   2

i.e. tom and mary both submitted 4 challenges. They are included because although the number 4 repeats, it's a maximum John and Anna submitted 3 each. They are excluded because 3 is not a max per person. Steve submitted 2 and this number is unique, so he's included, too.

Here is my code:

SELECT h.hacker_id, 
h.name, 
COUNT(c.challenge_id) AS ChalCountPerHead
FROM    hackers h 
        JOIN challenges c ON h.hacker_id = c.hacker_id
        LEFT JOIN ( 
            SELECT d.FreqHacker, COUNT(d.FreqHacker) as FreqOfFreq FROM 
                (SELECT hacker_id, COUNT(challenge_id) AS FreqHacker 
                 FROM Challenges GROUP BY hacker_id) d
            GROUP BY d.FreqHacker
        ) dd
        ON FreqHacker = COUNT(c.challenge_id)
GROUP BY h.hacker_id, h.name
HAVING 
COUNT(c.challenge_id) = (SELECT MAX(d.FreqHacker) from d) 
OR
dd.FreqOfFreq = 1

It doesn't work, stating an error message on this line

ON FreqHacker = COUNT(c.challenge_id)

An aggregate cannot appear in an ON clause unless it is in a sub query contained in a HAVING clause or select list, and the column being aggregated is an outer reference.

Solution

Here is one way to do it.

Having sample data in the question make it way easier to verify the solution. Please include it next time.

CTE is a simple aggregation to get the number of challenges submitted by each hacker.

In CTE2 MAX gives the global maximum frequency. HackerCountOfSameFreq is the count of hackers that have the same frequency.

Final WHERE removes groups of hackers that consist of more than 1 hacker, but leaves the group that has the maximum frequency.

Sample data

DECLARE @Hackers TABLE (hacker_id int, name varchar(50));
INSERT INTO @Hackers VALUES
(1, 'john'),
(2, 'tom'),
(3, 'anna'),
(4, 'mary'),
(5, 'steve');

DECLARE @Challenges TABLE (challenge_id int, hacker_id int);
INSERT INTO @Challenges VALUES
(1 ,  1),
(2 ,  1),
(3 ,  1),
(4 ,  2),
(5 ,  2),
(6 ,  2),
(7 ,  2),
(8 ,  3),
(9 ,  3),
(10,  3),
(11,  4),
(12,  4),
(13,  4),
(14,  4),
(15,  5),
(16,  5);

Query

WITH
CTE
AS
(
    SELECT hacker_id, COUNT(*) AS FreqHacker
    FROM @Challenges
    GROUP BY hacker_id
)
,CTE2
AS
(
    SELECT
        hacker_id
        ,FreqHacker
        ,COUNT(*) OVER (PARTITION BY FreqHacker) AS HackerCountOfSameFreq
        ,MAX(FreqHacker) OVER () AS GlobalMaxFreq
    FROM CTE
)

SELECT
    CTE2.hacker_id
    ,CTE2.FreqHacker
    ,H.Name
FROM
    CTE2
    INNER JOIN @Hackers AS H ON H.hacker_id = CTE2.hacker_id
WHERE
    HackerCountOfSameFreq = 1
    OR FreqHacker = GlobalMaxFreq
ORDER BY
    CTE2.hacker_id
;

Result

+-----------+------------+-------+
| hacker_id | FreqHacker | Name  |
+-----------+------------+-------+
| 2         | 4          | tom   |
+-----------+------------+-------+
| 4         | 4          | mary  |
+-----------+------------+-------+
| 5         | 2          | steve |
+-----------+------------+-------+

Your query also produces correct result (at least with this sample data), once the syntax is fixed.

I've split it into CTEs, leaving most of your logic as is:

WITH
d
AS
(
    SELECT hacker_id, COUNT(challenge_id) AS FreqHacker 
    FROM @Challenges 
    GROUP BY hacker_id
)
,dd
AS
(
    SELECT d.FreqHacker, COUNT(d.FreqHacker) as FreqOfFreq 
    FROM d
    GROUP BY d.FreqHacker
)
,d3
AS
(
    SELECT 
        h.hacker_id, 
        h.name, 
        COUNT(c.challenge_id) AS ChalCountPerHead
    FROM    
        @hackers h 
        JOIN @challenges c ON h.hacker_id = c.hacker_id
    GROUP BY h.hacker_id, h.name
)
,d4
AS
(
    SELECT *
    FROM 
        d3
        LEFT JOIN dd ON dd.FreqHacker = ChalCountPerHead
)
SELECT *
FROM d4
WHERE
    ChalCountPerHead = (SELECT MAX(d.FreqHacker) from d)
    OR FreqOfFreq = 1
ORDER BY hacker_id
;