Search code examples
sqlsql-serversubquerywindow-functions

T-SQL query to count number of rows where a date is bound between dates defined in other columns


I have a set of data showing the deals of each company and the date when the deal happened.

I create a new column of the date 5 years before the deal happened (deal_date_5y_back).

company_id deal_id deal_date deal_date_5y_back
86 1 2006-02-27 2001-02-27
86 2 2012-04-01 2007-04-01
86 3 2015-03-23 2010-03-23
86 4 2018-09-19 2013-09-19

I want to create a new column that counts the number of deals between the deal_date and the deal_date_5yb, for each company and each deal_date.

Essentially I want to write something like in the count over partition by clause below, but I am not sure how to do it.

select 
    company_id
    , deal_date
    , count(distinct deal_id) over (partition by company_id, deal_date where deal_date between deal_date_5yb and dateadd(day, -1, deal_date) ) num_deals_5y_back
from 
    (select 
         company_id, deal_id, deal_date, 
         dateadd(year, -5, deal_date) deal_date_5y_back
     from 
         DEALS_TABLE
     where 
         deal_date is not null) T
group by 
    company_id, deal_date

The output I expect is:

company_id deal_id deal_date deal_date_5yb num_deals_5yb
86 1 2006-02-27 2001-02-27 0
86 2 2012-04-01 2007-04-01 0
86 3 2015-03-23 2010-03-23 1
86 4 2018-09-19 2013-09-19 0

Any help is appreciated!


Solution

  • In SQL Server, a lateral join might be the simplest approach:

    select d.*, d1.*
    from deals_table d
    cross apply (
        select count(*) num_deals_5y_back
        from deals_table d1
        where d1.company_id = d.company_id
          and d1.deal_date >= dateadd(year, -5, d.deal_date)
          and d1.deal_date <  d.deal_date
    ) d1
    

    Demo on DB Fiddlde


    For what it's worth: in databases that support standard date arithmetics in the range clause of window functions, we could do this without a subquery:

    select d.*,
        count(*) over(
            partition by company_id
            order by deal_date
            range between interval '5' year preceding and current row
        ) - 1 num_deals_5y_back
    from deals_table d
    

    This works in Postgres, and in Oracle as well - alas not in SQL Server, as the documentation points out:

    <unsigned value specification> FOLLOWING

    Specified with to indicate the number of rows or values to follow the current row. [..]. This specification is not allowed for RANGE.

    Attempting to force the use of such clause raises the following error message:

    RANGE is only supported with UNBOUNDED and CURRENT ROW window frame delimiters