Search code examples
sqldatetimehivewindow-functionsdatediff

SQL: Difference between consecutive rows


Table with 3 columns: order id, member id, order date

enter image description here

Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id

What I have is this:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id+1;

It's not helping me completely as the output I need is:

enter image description here


Solution

  • You can use lag() to get the date of the previous order by the same customer:

    select o.*,
        datediff(
            order_date,
            lag(order_date) over(partition by member_id order by order_date, order_id)
        ) days_diff
    from orders o
    

    When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.

    I just don't get the logic you want to compute num_orders.