Search code examples
sqlsql-servert-sqldataexplorer

Why is this votes/posts ratio always 1?


I'm working with SEDE to create a graph of the ratio of votes to posts. Having eliminated all the actual errors, I'm faced with a new problem: for some reason, the ratio is always 1. This is the current SQL:

SELECT CAST(p.CreationDate AS DATE) AS [CreationDate],
       COUNT(CAST(v.CreationDate AS DATE)) / COUNT(CAST(p.CreationDate AS DATE))
         AS [Ratio]
FROM Posts p
INNER JOIN Votes v ON v.PostId = p.Id
WHERE v.VoteTypeId = ##VoteType:int?2## AND
      p.PostTypeId = 1 OR p.PostTypeId = 2
GROUP BY CAST(p.CreationDate AS DATE)
ORDER BY Ratio

The query itself can be found here.

It was suggested in chat that this might be because joining the tables results in every possible combination, so the number of votes and posts is always the same (thus n/n = 1). Is this correct, and if so what should I be doing instead?


Solution

  • Since both sides of an in an inner join exist, both COUNT(CAST(v.CreationDate AS DATE)) and COUNT(CAST(p.CreationDate AS DATE))will return exactly the same number, which is the number of rows in a group*.

    If you would like to count how many new votes per new post you've got on a given date, use COUNT(DISTINCT):

    SELECT CAST(p.CreationDate AS DATE) AS [CreationDate],
           COUNT(DISTINCT v.Id) / COUNT(DISTINCT p.Id) AS [Ratio]
    FROM Posts p
    INNER JOIN Votes v ON v.PostId = p.Id
    WHERE v.VoteTypeId = ##VoteType:int?2## AND
          p.PostTypeId = 1 OR p.PostTypeId = 2
    GROUP BY CAST(p.CreationDate AS DATE)
    ORDER BY Ratio
    

    * assuming that CreationDate is not nullable.