Search code examples
sqlamazon-redshiftwindow-functions

row_number() over(partition by ____ order by____) but only when dates have elapsed a certain amount of time


I have a "package rank" based on a certain number of times a unique customer id has been sent a package.

row_number() over (partition by package.customer_id 
                   order by ship_date.shipped_date) as package_rank

The output returned is something like this:

+------------+-----------+-----+
|customer_id | ship_date | rank|
+------------+-----------+-----+
| sam        | 8/20/2019 |  1  |
| sam        | 9/20/2019 |  2  |
| sam        | 9/23/2019 |  3  | 
| tim        | 9/20/2019 |  1  |
| tim        | 10/18/2019|  2  |
+------------+-----------+-----+

Since, it is unlikely that we would have shipped another complete box within 3 days, like in the case of sam, I would not want to include that shipment. I would only want to have the rank include shipment dates that are at least 28 days later than the previous ship date. Please let me know what the best way to go about this is! Thank you in advance.


Solution

  • Use a window function to exclude such rows from the list:

    SELECT customer_id,
           ship_date,
           row_number()
               OVER (PARTITION BY customer_id
                     ORDER BY ship_date) AS rank
    FROM (SELECT customer_id,
                 ship_date,
                 lag(ship_date)
                    OVER (PARTITION BY customer_id
                          ORDER BY ship_date) AS prev_ship_date
           FROM package) AS p1
    WHERE (prev_ship_date + 28 > ship_date) IS NOT FALSE
    ORDER BY rank;