Despite my efforts I cannot think of a way to answer my need :
I have 2 tables containing respectively a set of loans to be reimbursed, and a set of reimbursements on these loans (not all loans have an entry in reimbursement table because some of them remain unpaid).
Table loan has columns:
Table reimbursement has columns:
My goal is to obtain the amount that was overdue for more than 5 days for each date in the past.
Here is where I am at (in Postgres grammar):
with due_for_5_days_by_date AS (
select
due_date + interval '5' DAY as due_date,
sum(amount) as amount
FROM loan
GROUP BY due_date
),
paid_by_date as (
select
payment_date,
sum(amount) as amount
FROM reimbursement
join loan on loan.id = loan_id
GROUP BY payment_date
),
cumulated as (
SELECT
COALESCE(due_date, payment_date) as date,
SUM(COALESCE(paid_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as paid,
SUM(COALESCE(due_for_5_days_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as due_5d
FROM due_for_5_days_by_date
FULL OUTER JOIN paid_by_date on due_date = payment_date
)
select
DATE,
due_5d - paid as overdue_5d
from cumulated
order by date
Explanation :
But the problem is that, by doing this, the amount paid by date also contains reimbursement of loans expired for less than 5 days. In other words the paid amount for each date in the final query is overestimated as it should exclude the amount received on loans recently expired on that date. I cannot think of a way to fix this.
Can you think of a way to obtain what I want ?
Thanks a lot for your help !
I just found this way of doing it, but it uses a cross join, so not sure it will scale well on my whole data, even though it is hosted on BigQuery:
with all_dates as (
select distinct payment_date as date
from reimbursement
union
select distinct due_date as date
from loan
union
select distinct due_date + interval '5' day as date
from loan
),
joined_data as (
SELECT
amount,
due_date,
payment_date
from loan
LEFT join reimbursement on loan.id = loan_id
)
select
date,
SUM(
CASE
WHEN (due_date <= DATE - INTERVAL '5' day) and (payment_date > DATE or payment_date is null) then amount
else 0
END
) as overdue_5d
FROM all_dates
cross join joined_data
group by date
order by DATE