I have a "package rank" based on a certain number of times a unique customer id has been sent a package.
row_number() over (partition by package.customer_id
order by ship_date.shipped_date) as package_rank
The output returned is something like this:
+------------+-----------+-----+
|customer_id | ship_date | rank|
+------------+-----------+-----+
| sam | 8/20/2019 | 1 |
| sam | 9/20/2019 | 2 |
| sam | 9/23/2019 | 3 |
| tim | 9/20/2019 | 1 |
| tim | 10/18/2019| 2 |
+------------+-----------+-----+
Since, it is unlikely that we would have shipped another complete box within 3 days, like in the case of sam, I would not want to include that shipment. I would only want to have the rank include shipment dates that are at least 28 days later than the previous ship date. Please let me know what the best way to go about this is! Thank you in advance.
Use a window function to exclude such rows from the list:
SELECT customer_id,
ship_date,
row_number()
OVER (PARTITION BY customer_id
ORDER BY ship_date) AS rank
FROM (SELECT customer_id,
ship_date,
lag(ship_date)
OVER (PARTITION BY customer_id
ORDER BY ship_date) AS prev_ship_date
FROM package) AS p1
WHERE (prev_ship_date + 28 > ship_date) IS NOT FALSE
ORDER BY rank;