Search code examples
sqlgoogle-bigquery

How to find difference in values between the first and last day of the month in Google BigQuery?


I have a dataset that looks similar to this:

ID Timestamp Value 1 Value 2
1 2023-03-01 05:00:00 UTC 1200 3.2
1 2023-03-15 05:00:00 UTC 1400 2.4
1 2023-03-31 05:00:00 UTC 2000 2.8
2 2023-03-01 05:00:00 UTC 850 4
2 2023-03-15 05:00:00 UTC 1250 4.1
2 2023-03-31 05:00:00 UTC 3300 3.6

Where Value 1 and Value 2 are taken once a day at the same time for each ID.

Value 1 is an increasing number throughout the month while Value 2 is not.

I would like to find the difference in Value 1 between the last day and first day of every month for each ID.

Additionally, I would like to find the maximum Value 2 in every month for each ID.

Would it be easier to create two separate queries for each Value column or is there a way to do each task in the same query?


Solution

  • Assuming your table name is Table.

    First create a CTE ProcessedInfo, with some additional colums to make our calculation easier, for which we will use Window Functions.

    The three new colums are as follows:

    1. Value1_FirstDay_Month - The Value1 value for the first day of each month of each year. For this will use FIRST_VALUE and partition by year and month, and then order by timestamp in ascending order
    2. Value1_LastDay_Month - Same as the above, except its the Value1 for the last day of each month and year. We will use LAST_VALUE for this one, with same partitioning and ordering as the previous column.
    3. Value2_Max_ID_Month - The maximum Value2, for each ID,Year and Month.

    And then, a second CTE FinalResult, where we use the columns created in the first CTE, to calculate the desired result.

    1. All colums of the original table.
    2. Value1_Diff - The difference between Value1_LastDay_Month and Value1_FirstDay_Month
    3. Value2_Max_ID_Month - Same as above. No calculation necessary as this is the desired result.
    WITH ProcessedInfo AS (
    
    SELECT
     *,
     FIRST_VALUE(Value1) OVER(PARTITION BY EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp) ORDER BY Timestamp ASC) AS Value1_FirstDay_Month,
      LAST_VALUE(Value1) OVER(PARTITION BY EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp) ORDER BY Timestamp ASC) AS Value1_LastDay_Month,
     MAX(Value2) OVER(PARTITION BY ID, EXTRACT(YEAR FROM Timestamp), EXTRACT(MONTH FROM Timestamp)) AS Value2_Max_ID_Month
    FROM Table
    ),
    
    FinalResult AS (
     SELECT ID, Timestamp, Value1, Value2,
      (Value1_LastDay_Month - Value1_FirstDay_Month) AS Value1_Diff,
       Value2_Max_ID_Month
     FROM ProcessedInfo
    )
    
    SELECT * FROM FinalResult;