Search code examples
sqlpostgresqlgoogle-bigqueryaggregate-functionswindow-functions

Overdue evolution in SQL


Despite my efforts I cannot think of a way to answer my need :

I have 2 tables containing respectively a set of loans to be reimbursed, and a set of reimbursements on these loans (not all loans have an entry in reimbursement table because some of them remain unpaid).

Table loan has columns:

  • id
  • amount
  • due_date

Table reimbursement has columns:

  • id
  • debt_id
  • payment_date

My goal is to obtain the amount that was overdue for more than 5 days for each date in the past.

Here is where I am at (in Postgres grammar):

with due_for_5_days_by_date AS (
    select 
    due_date + interval '5' DAY as due_date,
    sum(amount) as amount
    FROM loan
    GROUP BY due_date
),
paid_by_date as (
    select 
        payment_date,
        sum(amount) as amount
    FROM reimbursement
    join loan on loan.id = loan_id
    GROUP BY payment_date
),
cumulated as (
    SELECT
        COALESCE(due_date, payment_date) as date,
        SUM(COALESCE(paid_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as paid,
        SUM(COALESCE(due_for_5_days_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as due_5d
    FROM due_for_5_days_by_date
    FULL OUTER JOIN paid_by_date on due_date = payment_date
)
select 
    DATE,
    due_5d - paid as overdue_5d
from cumulated
order by date

Explanation :

  • First I get the amount due for exactly 5 days for each date
  • Then, I get the amount paid for each date
  • Then I sum the amounts obtained at 1. and 2. to get the cumulative due and paid amounts
  • Then by difference I get the overdue amount

But the problem is that, by doing this, the amount paid by date also contains reimbursement of loans expired for less than 5 days. In other words the paid amount for each date in the final query is overestimated as it should exclude the amount received on loans recently expired on that date. I cannot think of a way to fix this.

Can you think of a way to obtain what I want ?

Thanks a lot for your help !


Solution

  • I just found this way of doing it, but it uses a cross join, so not sure it will scale well on my whole data, even though it is hosted on BigQuery:

    with all_dates as (
        select distinct payment_date as date
        from reimbursement
        union
        select distinct due_date as date
        from loan
        union
        select distinct due_date + interval '5' day as date
        from loan
    ), 
    joined_data as (
        SELECT
            amount,
            due_date,
            payment_date
        from loan
        LEFT join reimbursement on loan.id = loan_id
    )
    select 
        date,
        SUM(
          CASE
            WHEN (due_date <= DATE - INTERVAL '5' day) and (payment_date > DATE or payment_date is null) then amount
            else 0
          END
        ) as overdue_5d
    FROM all_dates
    cross join joined_data
    group by date
    order by DATE