Search code examples
sqldata-miningstackexchangedataexplorer

Query for fastest growing tags in data explorer?


I'd like to compare the popularity of tags between two months, ordered by the biggest change.

I've worked out how to count the number of tags in a month, but not how to compare them. Here's what I have so far:

select TOP 10
    tags.tagname, count(*) AS tagcount
from Posts
    INNER JOIN PostTags ON PostTags.PostId = Posts.id
    INNER JOIN Tags     ON Tags.id         = PostTags.TagId
where
    datepart(year,  Posts.CreationDate) = 2011 and
    datepart(month, Posts.CreationDate) = 1
Group by tags.tagname
Order by tagcount DESC

https://data.stackexchange.com/stackoverflow/qe/924/query-count-tags-from-daterange
(note: you can clone, edit in place, and run it)

I'm new to SQL. It seems I just need to create a second query for the previous month (December 2010), and then combine these two queries, with a column that is prevMonth.count - nextMonth.count, and order by that column (getting just the top X, so it doesn't take forever).

But I can't work out how to combine two queries in this way - I think you should be able to nest them, but I can't get it to work. Another way is to create a temporary table - it seems inefficient to me, but maybe that is the right way?

Many thanks for any help!


BTW: what I'd like to do next:

  1. After this, I'd like to find the rate of growth (not just the absolute change in numbers). That's just (new-old)/old. Like velocity, but normalized.

  2. Then, the tags whose rate of growth is increasing the most - i.e. that have exponential growth. This is would require 3 months: calculate the rate of change between months 1 and 2, and between months 2 and 3. The difference between these is the rate of change of the rate of change. It's like acceleration.

[ This is as a signal for new technologies growing, which often start small in a very specific usage. The people in that small area talk to each other, and good ideas get passed on in a chain-reaction: one person tells two, they each tell two more and so on. The that niche might get converted fully after a while, and maybe it spreads to another, similar niche. See "Seeing What's Next", by the Innovator's Dilemma guy. ]


Here's a later version, using JNK's answer: https://data.stackexchange.com/stackoverflow/q/92869/query-tags-with-highest-increase-in-growth-over-3-months

And... the fastest growing tag is... facebook-c#-sdk. Dunno how useful this is, but it's an interesting way to browse SO.


Solution

  • Use #Temp Tables:

    -- QUERY: count tags from daterange
    -- TODO: compare from two different dateranges...
    
    select TOP 10
        tags.tagname, count(*) AS tagcount
    INTO #TagCountTemp1
    from Posts
        INNER JOIN PostTags ON PostTags.PostId = Posts.id
        INNER JOIN Tags     ON Tags.id         = PostTags.TagId 
    where
        datepart(year,  Posts.CreationDate) = 2011 and
        datepart(month, Posts.CreationDate) = 1
    Group by tags.tagname
    Order by tagcount DESC
    
    select TOP 10
        tags.tagname, count(*) AS tagcount
    INTO #TagCountTemp2
    from Posts
        INNER JOIN PostTags ON PostTags.PostId = Posts.id
        INNER JOIN Tags     ON Tags.id         = PostTags.TagId 
    where
        datepart(year,  Posts.CreationDate) = 2010 and
        datepart(month, Posts.CreationDate) = 12
    Group by tags.tagname
    Order by tagcount DESC
    
    SELECT TOP 10
    t2.tagname, t2.tagcount as 'Month 1', t1.tagcount as 'Month 2', (t1.tagcount-t2.tagcount) as 'Increase'
    FROM #TagCountTemp1 as t1
    LEFT JOIN #TagCountTemp2 as t2
      ON T1.tagname = t2.tagname
    ORDER BY (t1.tagcount-t2.tagcount) desc​​​​​​​​​​​​​​​
    

    ​This worked fine for me!