Search code examples
sqlsql-server-2017dataexplorer

Using Stack Exchange Data Explorer (SEDE) to find users by post count and reputation


I want to find out which users have the most reputation with the least amount of posts (less than 10). But why can't I have a where clause before this join? :

SELECT TOP 100 Users.Id, Users.DisplayName AS [Username], Users.Reputation, COUNT(Posts.Id) AS [Post Count] FROM Users
//WHERE COUNT(Posts.Id) < 10
JOIN Posts ON Posts.OwnerUserId = Users.Id 
GROUP BY Users.Id, Users.DisplayName, Users.Reputation
ORDER BY Users.Reputation DESC;

The original user post count example query is at data.stackexchange.com/stackoverflow/query/503051


Solution

  • That is what the HAVING clause (MS reference) is for.

    You would use:

    SELECT TOP 100 Users.Id, Users.DisplayName AS [Username], Users.Reputation, COUNT(Posts.Id) AS [Post Count] FROM Users
    JOIN Posts ON Posts.OwnerUserId = Users.Id
    GROUP BY Users.Id, Users.DisplayName, Users.Reputation
    HAVING COUNT(Posts.Id) < 10
    ORDER BY Users.Reputation DESC;
    

    But here it is, leveraging a few SEDE features:

    -- maxRows: How many rows to return:
    -- maxPosts: Maximum number of posts a user can have:
    
    SELECT TOP ##maxRows:INT?100##
                'site://u/' + CAST(u.Id AS NVARCHAR) + '|' + u.DisplayName  AS [User]
                , u.Reputation
                , COUNT (p.Id)  AS [Post Count]
    FROM        Users u
    LEFT JOIN   Posts p         ON (p.OwnerUserId = u.Id  AND  p.PostTypeId IN (1, 2) )  -- Q & A only
    GROUP BY    u.Id
                , u.DisplayName
                , u.Reputation
    HAVING      COUNT (p.Id) <= ##maxPosts:INT?10##
    ORDER BY    u.Reputation DESC
                , [Post Count]
                , u.DisplayName
    

    You can see it live in SEDE.

    I particularly like the users with higher rep that have no posts.