Search code examples
sqlgoogle-bigquerycumulative-sum

Calculate cumulative sum in with restart in SQL based on condition


I am trying to create a cumulative sum column, which increments by 1 every time a payment column has a value of zero. That portion was fairly simple to figure out, what I can't seem to figure out is how to restart the cumulative sum back to zero once the value is not equal to zero.

This is an example of what I need to achieve - look at the CONSECUTIVE_MTHS_UNPAID field enter image description here

The SQL I've tried is as follows:

SELECT
  CUSTOMER,
  CONTRACT_NO,
  INVOICE_DATE,
  PAYMENT,
  SUM(CASE 
  WHEN PAYMENT = 0 THEN 1 
  WHEN PAYMENT < 0 THEN 0 END) OVER (PARTITION BY CUSTOMER, CONTRACT_NO ORDER BY INVOICE_DATE) AS CONSECUTIVE_MTHS_UNPAID
FROM
  project_one.database_one.contracts_table

I need help figuring this out

EDIT here is the data in text format

CUSTOMER|CONTRACT_NO|INVOICE_DATE|PAYMENT|CONSECUTIVE_MTHS_UNPAID
ABC     |1          |2010-02-09  |-300   |0
ABC     |1          |2010-03-08  |-600   |0
ABC     |1          |2010-04-08  |0      |1
ABC     |1          |2010-05-08  |-300   |0
ABC     |1          |2010-06-08  |-550   |0
ABC     |1          |2010-07-08  |-250   |0
ABC     |1          |2010-08-09  |-250   |0
ABC     |1          |2010-09-08  |-250   |0
ABC     |1          |2010-10-08  |-250   |0
ABC     |1          |2010-11-08  |-250   |0
ABC     |1          |2010-12-08  |-250   |0
ABC     |1          |2011-01-08  |-250   |0
ABC     |1          |2011-02-08  |0      |1
ABC     |1          |2011-03-08  |0      |2
ABC     |1          |2011-04-08  |0      |3
ABC     |1          |2011-05-08  |-250   |0
ABC     |1          |2011-06-08  |-250   |0
ABC     |1          |2011-07-08  |-435.57|0

Solution

  • You can try this:

    WITH SampleData AS 
    (
      SELECT 'ABC' AS CUSTOMER,
             '1' AS CONTRACT_NO,
             DATE '2010-02-09' AS INVOICE_DATE,
            -300 AS PAYMENT
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-03-08', -600
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-04-08', 0
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-05-08', -300
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-06-08', -550
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-07-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-08-09', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-09-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-10-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-11-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2010-12-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-01-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-02-08', 0
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-03-08', 0
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-04-08', 0
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-05-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-06-08', -250
      UNION ALL
      SELECT 'ABC', '1', DATE '2011-07-08', -435.57 
    ),
    data_with_groups AS 
    (
      SELECT *
           ,CASE WHEN PAYMENT = 0 THEN 1 ELSE 0 END AS payment_flag
           ,ROW_NUMBER() OVER (PARTITION BY CUSTOMER, CONTRACT_NO ORDER BY INVOICE_DATE) - ROW_NUMBER() OVER (PARTITION BY CUSTOMER, CONTRACT_NO, CASE WHEN PAYMENT = 0 THEN 1 ELSE 0 END ORDER BY INVOICE_DATE) AS reset_group
      FROM SampleData 
    )
    SELECT CUSTOMER,
           CONTRACT_NO,
           INVOICE_DATE,
           PAYMENT,
           SUM(payment_flag) OVER (PARTITION BY CUSTOMER, CONTRACT_NO, reset_group ORDER BY INVOICE_DATE) AS consecutive_mths_unpaid
    FROM data_with_groups
    ORDER BY CUSTOMER,
             CONTRACT_NO,
             INVOICE_DATE;
    

    enter image description here

    The first part of the query is used to defined when a reset flag occurs and to group the in reset groups:

      SELECT *
           ,CASE WHEN PAYMENT = 0 THEN 1 ELSE 0 END AS payment_flag
           ,ROW_NUMBER() OVER (PARTITION BY CUSTOMER, CONTRACT_NO ORDER BY INVOICE_DATE) - ROW_NUMBER() OVER (PARTITION BY CUSTOMER, CONTRACT_NO, CASE WHEN PAYMENT = 0 THEN 1 ELSE 0 END ORDER BY INVOICE_DATE) AS reset_group
      FROM SampleData 
    

    enter image description here

    The we just need to sum the reset flags for each groups:

    SELECT CUSTOMER,
           CONTRACT_NO,
           INVOICE_DATE,
           PAYMENT,
           SUM(payment_flag) OVER (PARTITION BY CUSTOMER, CONTRACT_NO, reset_group ORDER BY INVOICE_DATE) AS consecutive_mths_unpaid
    FROM data_with_groups
    ORDER BY CUSTOMER,
             CONTRACT_NO,
             INVOICE_DATE;