Search code examples
sqldataexplorer

How to use the Stack Exchange Data Explorer to find every user's top tags?


I use this query in the Stack Exchange Data Explorer:

select id, reputation
from users
where reputation >300000

What should I add to this query in order to have an additional column with the "best" tag of the user?

That is the tag where the user has the highest score and/or the user has the most posts (questions, answers).
For example, in my user page, in top tags, I can see .


Solution

  • Here's one way using Common Table Expressions (CTE's) and Correlated Subqueries.

    See it in SEDE at: data.stackexchange.com/stackoverflow/query/815284/...

    WITH tagAndUserStats AS (
        SELECT
                    t.tagname,
                    p.owneruserid       AS userid,
                    SUM (p.score)       AS tagScore,
                    COUNT (p.id)        AS postCount
        FROM        posts       p
        INNER JOIN  posttags    pt      ON pt.postid = COALESCE (p.parentid, p.id)
        INNER JOIN  tags        t       ON pt.tagid = t.id
        WHERE       p.owneruserid >= 1
        AND         p.posttypeid IN (1, 2)
        GROUP BY    t.tagname,
                    p.owneruserid
    )
    SELECT
                u.id as [User Link],
                u.Reputation,
                (   SELECT TOP 1    tu.tagname
                    FROM            tagAndUserStats tu
                    WHERE           tu.userid = u.id
                    ORDER BY        tu.tagScore DESC
                ) AS [Top Tag by Score],
                (   SELECT TOP 1    tu.tagname
                    FROM            tagAndUserStats tu
                    WHERE           tu.userid = u.id
                    ORDER BY        tu.postCount DESC
                ) AS [Top Tag by Posts]
    FROM        users u
    WHERE       u.reputation > 300000
    ORDER BY    u.reputation DESC
    


    Which returns results like:

    User Link       Reputation   Top Tag by Score   Top Tag by Posts
    Jon Skeet         1010838            c#                 c#
    BalusC             784437          java                jsf
    Darin Dimitrov     783553            c#                 c#
    VonC               753855           git                git
    

    Notes:

    1. Reference the SEDE schema.
    2. Because you specified both questions and answers, and didn't exclude community wiki posts, this query's results will sometimes be off from what the customary values are.
      For badges and whatnot, question-posts and question-scores don't count, only answers do.