I have a dataset that looks similar to this:
ID | Timestamp | Value 1 | Value 2 |
---|---|---|---|
1 | 2023-03-01 05:00:00 UTC | 1200 | 3.2 |
1 | 2023-03-15 05:00:00 UTC | 1400 | 2.4 |
1 | 2023-03-31 05:00:00 UTC | 2000 | 2.8 |
2 | 2023-03-01 05:00:00 UTC | 850 | 4 |
2 | 2023-03-15 05:00:00 UTC | 1250 | 4.1 |
2 | 2023-03-31 05:00:00 UTC | 3300 | 3.6 |
Where Value 1
and Value 2
are taken once a day at the same time for each ID
.
Value 1
is an increasing number throughout the month while Value 2
is not.
I would like to find the difference in Value 1
between the last day and first day of every month for each ID
.
Additionally, I would like to find the maximum Value 2
in every month for each ID
.
Would it be easier to create two separate queries for each Value column or is there a way to do each task in the same query?
Assuming your table name is Table
.
First create a CTE ProcessedInfo
, with some additional colums to make our calculation easier, for which we will use Window Functions.
The three new colums are as follows:
Value1_FirstDay_Month
- The Value1 value for the first day of each month of each year. For this will use FIRST_VALUE
and partition by year and month, and then order by timestamp in ascending orderValue1_LastDay_Month
- Same as the above, except its the Value1 for the last day of each month and year. We will use LAST_VALUE
for this one, with same partitioning and ordering as the previous column.Value2_Max_ID_Month
- The maximum Value2, for each ID,Year and Month.And then, a second CTE FinalResult
, where we use the columns created in the first CTE, to calculate the desired result.
Value1_Diff
- The difference between Value1_LastDay_Month
and Value1_FirstDay_Month
Value2_Max_ID_Month
- Same as above. No calculation necessary as this is the desired result.WITH ProcessedInfo AS (
SELECT
*,
FIRST_VALUE(Value1) OVER(PARTITION BY EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp) ORDER BY Timestamp ASC) AS Value1_FirstDay_Month,
LAST_VALUE(Value1) OVER(PARTITION BY EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp) ORDER BY Timestamp ASC) AS Value1_LastDay_Month,
MAX(Value2) OVER(PARTITION BY ID, EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp)) AS Value2_Max_ID_Month
FROM Table
),
FinalResult AS (
SELECT ID, Timestamp, Value1, Value2,
(Value1_LastDay_Month - Value1_FirstDay_Month) AS Value1_Diff,
Value2_Max_ID_Month
FROM ProcessedInfo
)
SELECT * FROM FinalResult;