I have the following data:
ID | Date |
---|---|
1 | 2023-10-05 |
2 | 2023-10-05 |
3 | 2023-10-05 |
4 | 2023-10-05 |
1 | 2023-10-10 |
2 | 2023-10-10 |
3 | 2023-10-10 |
4 | 2023-10-10 |
What i'd like to see is
ID | Date | Previous Date |
---|---|---|
1 | 2023-10-05 | null |
2 | 2023-10-05 | null |
3 | 2023-10-05 | null |
4 | 2023-10-05 | null |
1 | 2023-10-10 | 2023-10-05 |
2 | 2023-10-10 | 2023-10-05 |
3 | 2023-10-10 | 2023-10-05 |
4 | 2023-10-10 | 2023-10-05 |
This is what I've tried:
select ID, Date, lag(Date) over (partition by Date order by Date)
from Table;
But the result just shows the date of the preceeding row, not the preceeding group (indicated in the PARTITION BY portion), so i get this result:
ID | Date | Previous Date |
---|---|---|
1 | 2023-10-05 | null |
2 | 2023-10-05 | 2023-10-05 |
3 | 2023-10-05 | 2023-10-05 |
4 | 2023-10-05 | 2023-10-05 |
1 | 2023-10-10 | 2023-10-05 |
2 | 2023-10-10 | 2023-10-10 |
3 | 2023-10-10 | 2023-10-10 |
4 | 2023-10-10 | 2023-10-10 |
How can I show the previous date by group ignoring ID in the same query without the need of subqueries or joins using SQL?
EDIT *********
I tried @The Impaler's answer below, it is working and bringing the correct values for most of the rows, however this is where it's failing:
Let's say we have ID from 1 to 4 for date 2023-10-05, and then for the next date 2023-10-10 we have ID from 1 to 6, so the partition by ID will only work if ID values repeat between month, but not for new IDs in that month not present in the previous month:
ID | Date | Previous Date |
---|---|---|
1 | 2023-10-05 | null |
2 | 2023-10-05 | null |
3 | 2023-10-05 | null |
4 | 2023-10-05 | null |
1 | 2023-10-10 | 2023-10-05 |
2 | 2023-10-10 | 2023-10-05 |
3 | 2023-10-10 | 2023-10-05 |
4 | 2023-10-10 | 2023-10-05 |
5 | 2023-10-10 | null |
6 | 2023-10-10 | null |
Thanks
From the apache documentation it seems this is supported by the base dialect. I use a platform built on spark with additional bells and whistles so not 100% sure.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-window.html
last_distinct_date_value gives how I'm interpreting the question.
last_date_by_id is the existing answer for comparison.
with some_sample_data as (
SELECT data.id, _date
FROM VALUES (1, cast('2023-10-05' as date)),
(2, cast('2023-10-05' as date)),
(3, cast('2023-10-05' as date)),
(4, cast('2023-10-05' as date)),
(1, cast('2023-10-09' as date)),
(1, cast('2023-10-10' as date)),
(2, cast('2023-10-10' as date)),
(3, cast('2023-10-10' as date)),
(4, cast('2023-10-10' as date)),
(5, cast('2023-10-10' as date)),
(6, cast('2023-10-10' as date)),
(7, cast('2023-10-15' as date))
AS data(id, _date)
)
select id,
_date,
max(_date) over(
partition by 1
order by _date
range between unbounded preceding and 1 preceding
) as last_distinct_date_value,
lag(_date) over(
partition by id
order by _date
) as last_date_by_id
from some_sample_data
id _date last_distinct_date_value last_date_by_id
1 2023-10-05 - -
2 2023-10-05 - -
3 2023-10-05 - -
4 2023-10-05 - -
1 2023-10-09 2023-10-05 2023-10-05
1 2023-10-10 2023-10-09 2023-10-09
2 2023-10-10 2023-10-09 2023-10-05
3 2023-10-10 2023-10-09 2023-10-05
4 2023-10-10 2023-10-09 2023-10-05
5 2023-10-10 2023-10-09 -
6 2023-10-10 2023-10-09 -
7 2023-10-15 2023-10-10 -