Search code examples
sqlhivesqlhelper

How to assign a value to the next following column based on logic of preceding values in SQL HIVE?


I am trying to find a way to assign a value to the next following column based on logic of preceding values. For example, lets say we have a table with the column temp. If temp goes above as 95 then the table should display as "System Off", if the next value is greater than 80, then the status column still displays as "System Off", this status will change only when the next value is less than 80. See for example:

Location ID Event Time Temp System Status: Desired Column (valid_consecutive_values)
99999999 4:18:58 PM 80.3 "System On"
99999999 4:21:03 PM 70.5 "System On"
99999999 4:21:42 PM 96.2 "System Off"
99999999 4:25:04 PM 95.3 "System Off"
99999999 4:25:40 PM 82.1 "System Off"
99999999 4:25:45 PM 79.0 "System On"
99999999 4:26:14 PM 95.0 "System Off"
99999999 4:26:23 PM 72.0 "System On"

Here is the SQL I have tried, but it does not work as I am hoping it will.

> ```
> WITH input_data AS (
  SELECT *,
         CASE
           WHEN Temp >= 95 THEN 'System Off'
           ELSE 'System On'
         END AS valid_current_value
  FROM table
),
consecutive_values AS (
  SELECT *,
         SUM(CASE
               WHEN valid_current_value = 'System Off' THEN 1
               ELSE 0
             END)
           OVER (ORDER BY Event_Time
                 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS consecutive_falses
  FROM input_data
),
consecutive_values_above_80 AS (
  SELECT *,
         SUM(CASE
               WHEN Temp >= 80 THEN 1
               ELSE 0
             END)
           OVER (ORDER BY Event_Time
                 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS consecutive_values_above_80
  FROM consecutive_values
)
SELECT *,
       CASE
         WHEN consecutive_falses >= 2 OR consecutive_values_above_80 >= 2 THEN 'System Off'
         ELSE 'System On'
       END AS valid_consecutive_values
FROM consecutive_values_above_80
> ```

Logic: If current cell is 95 then "System Off". If next value is greater than 80 then 'System Off'. This logic will continue until it finds a value within the next row less than 80. Once we see a less than 80 then the status changes to 'System On'. Please note that this logic will only work if in the preceding values there was a 95 and the following values are greater than 80.

DESIRED OUTPUT EXAMPLE


Solution

  • This is a gaps and islands problem, you could try the following to solve it, the explanation within the query comments:

    with gaps_islands as
    (
      select *,
       sum(case when temp >= 95 then 1 else 0 end) over 
      (partition by LocationID order by EventTime) grp
      /*This is to create groups whenever a temp value goes above 95 */
      from tbl_name
    )
    select LocationID, EventTime, Temp,
      case 
        when 
          max(temp) over (partition by LocationID, grp) >=95 -- Check that there is a temp value >= 95 within the group 
          and temp >=80 -- The current temp >=80
          and min(temp) over (partition by LocationID, grp order by EventTime) >=80 -- Check that the last min temp value is >= 80 
         then 'System Off' else 'System On' 
      end as SystemStatus
    from gaps_islands
    order by EventTime
    

    Check this demo (on MySQL) based on the sample data provided in the posted image.