Search code examples
sqlsoql

SQL query for getting data for the last 6 months grouped by month?


I know a basic query to get some results for the last 6 months. Let's say like this:

SELECT * 
FROM RANDOM_TABLE
WHERE Date_Column >= DATEADD(MONTH, -6, GETDATE())

But what if I'd like to get results grouped by month - each month looking back 6 months into the past? The first three rows of a result could ideally look like this (count of IDs is random):

Month_and_year COUNT(ID)
January 2017 120
February 2017 160
March 2017 240

The last three rows:

Month_and_year COUNT(ID)
November 2021 80
December 2021 350
January 2021 260

Hope it's understandable. Thanks in advance!


Solution

  • EDIT: Over the hours I made a few corrections. Most notably I corrected the self join query to reflect my intentions and also added more details to better explain what is going on.

    To my knowledge there are two ways about it (which are probably the same under the hood).

    Also, please note that these solutions assume you have a month field already in place. If you have a date or timestamp field, you should take one extra preparation step.

    [Addendum] To be more precise, I'd say that the ideal would be to have a date/timestamp field that is truncated/flattened to the first day of the month.

    As an example,

    month amount
    2021-01-01 50
    2021-02-01 20
    2021-03-01 10
    2021-04-01 100
    2021-05-01 20
    2021-06-01 40
    2021-07-01 80
    2021-08-01 50

    The first is to use a "self-non-equi join"

    SELECT
        a.month,
        SUM(b.amount) AS amount_over_6_months
    FROM table AS a
        INNER JOIN table AS b ON a.month BETWEEN b.month AND DATEADD(MONTH, 5, b.month)
    WHERE a.month >= DATEADD(MONTH, -5, GETDATE())
    GROUP BY a.month
    

    What happens here is that you are joining the table with itself. Specifically, for each row in the (a) alias, you will join six rows from the (b) alias. For each row you will join the rows where the month is equal, all the way back to five months prior. So...

    a.month b.month a.amount b.amount
    2021-01-01 2021-01-01 50 50
    2021-02-01 2021-01-01 20 50
    2021-02-01 2021-02-01 20 20
    2021-03-01 2021-01-01 10 50
    2021-03-01 2021-02-01 10 20
    2021-03-01 2021-03-01 10 10
    2021-04-01 2021-01-01 100 50
    2021-04-01 2021-02-01 100 20
    2021-04-01 2021-03-01 100 10
    2021-04-01 2021-04-01 100 100
    2021-05-01 2021-01-01 20 50
    2021-05-01 2021-02-01 20 20
    2021-05-01 2021-03-01 20 10
    2021-05-01 2021-04-01 20 100
    2021-05-01 2021-05-01 20 20
    2021-06-01 2021-01-01 40 50
    2021-06-01 2021-02-01 40 20
    2021-06-01 2021-03-01 40 10
    2021-06-01 2021-04-01 40 100
    2021-06-01 2021-05-01 40 20
    2021-06-01 2021-06-01 40 40
    2021-07-01 2021-02-01 80 20
    2021-07-01 2021-03-01 80 10
    2021-07-01 2021-04-01 80 100
    2021-07-01 2021-05-01 80 20
    2021-07-01 2021-06-01 80 40
    2021-07-01 2021-07-01 80 80
    ... ... ... ...

    Then it's just a matter of grouping based on the month in the (a) alias, and summing the amounts coming from the (b) alias.

    The advantage of this approach is that it should be vendor and generation agnostic, save the DATEADD() fucuntion.

    The second solution would be to use window functions. I cannot comment on whether this would work with your vendor and the specific version.

    SELECT 
        month,
        SUM(amount) OVER (ORDER BY month ROWS BETWEEN 5 PRECEDING AND CURRENT ROW)
    FROM table