Search code examples
sqlpostgresqlranking

Fill table with last result


I have a table that contains the following information:

id | amount |   date   | customer_id
 1 |  0.00  | 11/12/17 | 1
 2 | 54.00  | 11/12/17 | 1
 3 | 60.00  | 02/12/18 | 1
 4 |  0.00  | 01/18/17 | 2
 5 | 14.00  | 03/12/17 | 2
 6 | 24.00  | 02/22/18 | 2
 7 |  0.00  | 09/12/16 | 3
 8 | 74.00  | 10/01/17 | 3

What I need it to look like is the following:

ranked_id | id | amount |   date   | customer_id
        1 |  1 |  0.00  | 11/12/17 | 1
        2 |  2 | 54.00  | 11/12/17 | 1            
        3 |  3 | 60.00  | 02/12/18 | 1
        4 |  3 | 60.00  | 02/12/18 | 1
        5 |  3 | 60.00  | 02/12/18 | 1
        6 |  3 | 60.00  | 02/12/18 | 1
        7 |  3 | 60.00  | 02/12/18 | 1
        8 |  4 |  0.00  | 01/18/17 | 2
        9 |  5 | 14.00  | 03/12/17 | 2
       10 |  6 | 24.00  | 02/22/18 | 2
       11 |  6 | 24.00  | 02/22/18 | 2
       12 |  6 | 24.00  | 02/22/18 | 2
       13 |  6 | 24.00  | 02/22/18 | 2
       14 |  6 | 24.00  | 02/22/18 | 2
       15 |  7 |  0.00  | 09/12/16 | 3
       16 |  8 | 74.00  | 10/01/17 | 3
       17 |  8 | 74.00  | 10/01/17 | 3
       18 |  8 | 74.00  | 10/01/17 | 3
       19 |  8 | 74.00  | 10/01/17 | 3
       20 |  8 | 74.00  | 10/01/17 | 3
       21 |  8 | 74.00  | 10/01/17 | 3

I know that there's something with partitioning and ranking (on the ranked_id), but I can't figure out how to repeat the last row 7 times.


Solution

  • As @Gordon Linoff suggested you can use the generate_series() function crossed with the distinct customer_ids to generate all the rows needed as in T1 below. Then in T2 (also below) the row_number function is used to generate a sequential value to outer join to from t1 along with the customer_id.

    From there it's just a matter of being able to get at the last value per customer_id when there is no original data to join to which is where the case statement and analytic first_value functions come in. I couldn't get the last_value analytic function to work likely due to postgresql's lack of an ignore nulls directive, so I used first_Value with a descending sort order, and only return the analytic value when no other data exists.

    with t1 as (
    select distinct 
           dense_rank() over (order by customer_id, generate_series) ranked_id
         , customer_id
         , generate_series
      from table1
      cross join generate_series(1,7)
    ), t2 as (
      select row_number() over (partition by customer_id order by id) rn
           , table1.*
        from table1
    )
    select t1.ranked_id
         , case when t2.customer_id is not null
                then t2.id
                else  first_value(t2.id)
                     over (partition by t1.customer_id
                           order by id desc nulls last)
           end id
         , case when t2.customer_id is not null
                then t2.amount
                else  first_value(t2.amount)
                     over (partition by t1.customer_id
                           order by id desc nulls last)
           end amount
         , case when t2.customer_id is not null
                then t2.date
                else  first_value(t2.date)
                     over (partition by t1.customer_id
                           order by id desc nulls last)
           end date
         , t1.customer_id
      from t1
      left join t2
        on t2.customer_id = t1.customer_id
       and t2.id = t1.generate_series
     order by ranked_id;
    

    Here's a SQL Fiddle demonstrating the code.