Search code examples
postgresqlwindow-functions

How to calculate total revenue within 3 months in postgresql


Thanks in advance for any help.

I have a table with unique tickets, customer IDs and ticket price. For each ticket, I want to see the number of tickets and total revenue from a customer 3 months after the date of the ticket.

I tried to use the partition by function with the date condition set in the on clause, but it just evaluates all tickets of the customer rather than the 3 month period I want.

select distinct on (at2.ticket_number)
at2.customer_id 
,at2.ticket_id 
,at2.ticket_number 
,at2.initial_sale_date 
,ata.tix "a_tix"
,ata.aov "a_aov"
,ata.rev "a_rev"

from reports.agg_tickets at2 
left join (select at2.customer_id, at2.final_fare_value, at2.initial_sale_date, count(at2.customer_id)  OVER (PARTITION BY at2.customer_id) AS tix,
            avg(at2.final_fare_value) over (partition by at2.customer_id) as aov,
            sum(at2.final_fare_value) over (partition by at2.customer_id) as rev
            from reports.agg_tickets at2
            ) ata
            on (ata.customer_id = at2.customer_id 
            and ata.initial_sale_date > at2.initial_sale_date 
            and ata.initial_sale_date < at2.initial_sale_date  + interval '3 months')

I could use a left join lateral, but it takes far too long. Slightly confused with how to achieve what I want, so any help would be greatly appreciated.

Many thanks

Edit: Here is the sample of data. Picture of data table.

The table is unique on ticket number, but not on customer.


Solution

  • No need to use a join at all, this will yield (as you observe) a problemetic performnce. What is your solution is a plain window function with a frame_clause that will consider the next 3 months for each ticket

    Example (self explained)

    count(*) over (partition by customer_id order by initial_sale_date 
                   range between current row and '3 months'  following) ticket_cnt
    

    Here a full query with simplified sample data and the result

    with dt as (
    select * from  (values
    (1, 1, date'2020-01-01', 10),
    (1, 2, date'2020-02-01', 15),
    (1, 3, date'2020-03-01', 20),
    (1, 4, date'2020-04-01', 25),
    (1, 5, date'2020-05-01', 30),
    (2, 6, date'2020-01-01', 15),
    (2, 7, date'2020-02-01', 20),
    (2, 7, date'2021-01-01', 25)
    ) tab (customer_id, ticket_id, initial_sale_date,final_fare_value)
    )
    select 
    customer_id, ticket_id, initial_sale_date, final_fare_value,
    count(*) over (partition by customer_id order by initial_sale_date range between current row and '3 months'  following) ticket_cnt,
    sum(final_fare_value) over (partition by customer_id order by initial_sale_date range between current row and '3 months'  following) ticket_sum
    from dt;
    
    customer_id|ticket_id|initial_sale_date|final_fare_value|ticket_cnt|ticket_sum|
    -----------+---------+-----------------+----------------+----------+----------+
              1|        1|       2020-01-01|              10|         4|        70|
              1|        2|       2020-02-01|              15|         4|        90|
              1|        3|       2020-03-01|              20|         3|        75|
              1|        4|       2020-04-01|              25|         2|        55|
              1|        5|       2020-05-01|              30|         1|        30|
              2|        6|       2020-01-01|              15|         2|        35|
              2|        7|       2020-02-01|              20|         1|        20|
              2|        7|       2021-01-01|              25|         1|        25|