I have a set of data showing the deals of each company and the date when the deal happened.
I create a new column of the date 5 years before the deal happened (deal_date_5y_back
).
company_id | deal_id | deal_date | deal_date_5y_back |
---|---|---|---|
86 | 1 | 2006-02-27 | 2001-02-27 |
86 | 2 | 2012-04-01 | 2007-04-01 |
86 | 3 | 2015-03-23 | 2010-03-23 |
86 | 4 | 2018-09-19 | 2013-09-19 |
I want to create a new column that counts the number of deals between the deal_date
and the deal_date_5yb
, for each company and each deal_date
.
Essentially I want to write something like in the count over partition by
clause below, but I am not sure how to do it.
select
company_id
, deal_date
, count(distinct deal_id) over (partition by company_id, deal_date where deal_date between deal_date_5yb and dateadd(day, -1, deal_date) ) num_deals_5y_back
from
(select
company_id, deal_id, deal_date,
dateadd(year, -5, deal_date) deal_date_5y_back
from
DEALS_TABLE
where
deal_date is not null) T
group by
company_id, deal_date
The output I expect is:
company_id | deal_id | deal_date | deal_date_5yb | num_deals_5yb |
---|---|---|---|---|
86 | 1 | 2006-02-27 | 2001-02-27 | 0 |
86 | 2 | 2012-04-01 | 2007-04-01 | 0 |
86 | 3 | 2015-03-23 | 2010-03-23 | 1 |
86 | 4 | 2018-09-19 | 2013-09-19 | 0 |
Any help is appreciated!
In SQL Server, a lateral join might be the simplest approach:
select d.*, d1.*
from deals_table d
cross apply (
select count(*) num_deals_5y_back
from deals_table d1
where d1.company_id = d.company_id
and d1.deal_date >= dateadd(year, -5, d.deal_date)
and d1.deal_date < d.deal_date
) d1
For what it's worth: in databases that support standard date arithmetics in the range
clause of window functions, we could do this without a subquery:
select d.*,
count(*) over(
partition by company_id
order by deal_date
range between interval '5' year preceding and current row
) - 1 num_deals_5y_back
from deals_table d
This works in Postgres, and in Oracle as well - alas not in SQL Server, as the documentation points out:
<unsigned value specification> FOLLOWING
Specified with to indicate the number of rows or values to follow the current row. [..]. This specification is not allowed for RANGE.
Attempting to force the use of such clause raises the following error message:
RANGE is only supported with UNBOUNDED and CURRENT ROW window frame delimiters