Search code examples
sqlsql-serverdate-arithmetic

Sql query to select the records for past 60 seconds and compare the temperature of the selected records and if any record has higher value then ignore


I am trying to eliminate the data anomalies in the data I am receiving from eventhub and send only selected data to azure functions through Azure stream analytics for that I am writing a sql query where I need some help

Requirement: I need to collect the past 60 seconds data and need to group by Id and compare the records that I received in the 60 seconds and If any record value is way higher than the selected values than ignore that record (for example, I will collect the 4 records in past 60 seconds and if the data is 40 40 40 40 5. We should drop the 5. Example 2 - 20 20 20 500 drop the 500. ).

My sql table will be something like this:

id  Temp    date    datetime
123 30  2023-01-01  2023-01-01 12:00:00
124 35  2023-01-01  2023-01-01 12:00:00
123 31  2023-01-01  2023-01-01 12:00:00
123 33  2023-01-01  2023-01-01 12:00:00
123 60  2023-01-01  2023-01-01 12:00:00
124 36  2023-01-01  2023-01-01 12:00:00
124 36  2023-01-01  2023-01-01 12:00:00
124 8   2023-01-01  2023-01-01 12:00:00
124 36  2023-01-01  2023-01-01 12:00:00

I need to eliminate the records that are not in the range with the other records


Solution

  • I'll leave the details of the comparison up to you, but you can use a CROSS APPLY to gather the data for comparison.

    Something like:

    SELECT *
    FROM TemperatureData T
    CROSS APPLY (
        SELECT AVG(T2.Temp * 1.0) AS PriorAvgTemp, COUNT(*) As PriorCount
        FROM TemperatureData T2
        WHERE T2.id = T.id
        AND T2.datetime >= DATEADD(second, -60, T.datetime)
        AND T2.datetime < T.datetime
    ) P
    WHERE T.Temp BETWEEN P.PriorAvgTemp - 10 AND P.PriorAvgTemp + 10
    --OR P.PriorCount < 3 -- Should we allow if there is insufficient prior data
    --AND P.PriorCount >= 3 -- Should we omit if there is insufficient prior data
    

    Be sure you have an index on TemperatureData(id, datetime).

    If you are willing to accept the last N values instead of a time range, windowed aggregate calculation may be more efficient.

    SELECT *
    FROM (
        SELECT *,
            AVG(T.Temp * 1.0)
                OVER(PARTITION BY id ORDER BY datetime
                     ROWS BETWEEN 60 PRECEDING AND 1 PRECEDING)
                AS PriorAvgTemp,
            COUNT(*) 
                OVER(PARTITION BY id ORDER BY datetime
                     ROWS BETWEEN 60 PRECEDING AND 1 PRECEDING)
                AS PriorCount
        FROM TemperatureData T
    ) TT
    WHERE TT.Temp BETWEEN TT.PriorAvgTemp - 10 AND TT.PriorAvgTemp + 10
    --OR TT.PriorCount < 3 -- Should we allow if there is insufficient prior data
    --AND TT.PriorCount >= 3 -- Should we omit if there is insufficient prior data
    

    Please note: The above is untested code, which may need some syntax fixes and debugging. If you discover errors, please comment and I will correct the post.