Search code examples
t-sqlsubquerymaxrow-number

How to return the Value from Column A on the row where the Max of Column B is found in a subquery


I have time series data and I am creating a rolling average over a 90 sample window (15 min). I then want to find the max 15 min window each hour so the data is grouped by Hour. However I want to also return the start or end time stamp of the max 15 min window. Currently I am only returning the time stamp at the start of each hour.

Here is an example of the data that I start with:

theDate    theTime     value
4/30/2019   22:47:53    0
4/30/2019   22:47:43    0
4/30/2019   22:47:33    0
4/30/2019   22:47:23    0
4/30/2019   22:47:13    0
4/30/2019   22:47:03    0
4/30/2019   22:46:53    0
4/30/2019   22:46:43    0

I have tried row_number() and filtered where but I don't think I am applying them correctly for this as there are several nested sub-queries.

select
        min([theDate]) as theDate, 
        min([theTime]) as theTime, 
        max([Value]) as maxValue, 
        max([rolling_avg]) as maxDM,
        timeHour as timeHour

from( select [theDate], [theTime], [Value],
       avg(windowAvg) over(order by theDate DESC, theTime rows between 90 preceding and current row) as rolling_avg,
       datepart(hh,theTime) as timeHour
  from (select [theDate], [theTime], [Value], sum([Value]) as windowAvg
          from [Data].[dbo].[tOne]
          Where ([theDate] > convert(DAte,DATEADD(month, -1, GETDATE())))
          group by theDate, theTime, Value 

          )tOneTemp   
          )tOneTempTwo
          group by theDate, timeHour
          order by theDate DESC, theTime DESC

I am looking for the table below except that instead of theTime always being the min per hour as I have it calculated I would like it to be a value within the 15 min window of that the MaxDM was calculated.

Here is an example of what I end up with using my current Code:

theDate    theTime   maxValue   maxDM       timeHour
2019-04-30  22:00:04    508.8   660.643956  22
2019-04-30  21:00:03    1071.3  798.206593  21
2019-04-30  20:00:03    1022.2  817.539560  20
2019-04-30  19:00:04    871.4   574.786813  19
2019-04-30  18:00:04    944.0   670.095604  18

What I would like to see is something like this such that "theTime" reflects the start of the "maxValue" window.

Here is an example of what I would like to see (*note theTime):

theDate    theTime   maxValue   maxDM       timeHour
2019-04-30  22:10:34    508.8   660.643956  22
2019-04-30  21:45:03    1071.3  798.206593  21
2019-04-30  20:12:03    1022.2  817.539560  20
2019-04-30  19:32:04    871.4   574.786813  19
2019-04-30  18:56:04    944.0   670.095604  18

Thanks Ali here are the few mods I made to your code to fix some errors but the "TheTimeOfMaxDM" returns as "NULL" for all rows.

                    min([theDate]) as theDate, 
                    min([theTime]) as theTime, 
                    (select top(1) r.[theTime] from RunningAvg15Min r where 
                        r.[theDate]=(select min([theDate]) from RunningAvg15Min) and
                        r.[theTime]>=(select min([theTime]) from RunningAvg15Min)
                        and r.rolling_avg=(select max([rolling_avg]) from RunningAvg15Min)) as TheTimeOfMaxDM,
                    max([KW]) as maxValue, 
                    max([rolling_avg]) as maxDM,
                    timeHour as timeHour
                from RunningAvg15Min
                GROUP BY theDate, timeHour
                ORDER BY thedate DESC, thetime DESC

Solution

  • I rewrote your query in a CTE so I can use the subquery again. please check the below:- so I added another output to your query called TheTimeOfMaxDM. hope this is what you need.

    ;with LastMonthData as (
                   --get the data for the last month, and sum value if same time stamp is recorded (this is not logical)
                   select [theDate], [theTime], [Value], sum([Value]) as windowAvg
                      from [tOne]
                      Where ([theDate] > convert(Date,DATEADD(month, -1, '2019-05-3')))--changed this to match the data provided.
                      group by theDate, theTime, Value 
              ),
    RunningAvg15Min as (
                        select [theDate], [theTime], [Value],
                        avg(windowAvg) over(order by theDate DESC, theTime rows between 90 preceding and current row) as rolling_avg,
                        datepart(hh,theTime) as timeHour
                    from LastMonthData
                  ),DataSetGrouping as (
                  select
                        min([theDate]) as theDate, 
                        min([theTime]) as theTime, 
                        min(datepart(hh,theTime)) as timeHour,
                        max([Value]) as maxValue, 
                        max([rolling_avg]) as maxDM
                    from RunningAvg15Min
                    GROUP BY
                        DATEPART(YEAR, TheDate),
                        DATEPART(MONTH, TheDate),
                        DATEPART(DAY, TheDate),
                        DATEPART(HOUR, TheTime),
                        (DATEPART(MINUTE, TheTime) / 60) ---change the 60 to 15, to change the grouping from per hour to per 15min
    
                    )
                    select * ,
                     (select top(1) r.[theTime] from RunningAvg15Min r where 
                            r.[theDate]=theDate
                            and r.[theTime]>=theTime
                            and r.rolling_avg=maxDM
                            order by r.[theDate],r.[theTime]
                            )  [TheTimeOfMaxDM]
                    from DataSetGrouping
                    ORDER BY thedate DESC, thetime DESC
    

    Another note, I dont know if you need this or not , but if you wanted to group by 15 min, you could use the below in the grouping section

    GROUP BY
        DATEPART(YEAR, TheDate),
        DATEPART(MONTH, TheDate),
        DATEPART(DAY, TheDate),
        DATEPART(HOUR, TheTime),
        (DATEPART(MINUTE, TheTime) / 15)
    

    I will try to validate a point, lets take for example the output row below:-

    theDate     theTime             Value   rolling_avg         timeHour
    2019-04-07  17:20:49.0000000    398.3   314.499999999997    17
    

    On 2019-04-07 between hour 17 & 18, the max rolling_avg is 314.499999999997, and the time for that max was 17:20:49.0000000, we can see that be replacing the last part of the cte with the following:-

    select * from RunningAvg15Min where rolling_avg between 314 and 315
                    and thedate='2019-04-07' and theTime between '17:00:00' and '18:00:00'
    

    output

    theDate      theTime            Value   rolling_avg         timeHour
    2019-04-07  17:20:49.0000000    398.3   314.499999999997    17
    

    To summarize my answer, you can use CTE's in the fashion from the query I wrote to simplify the sub queries and to reference them more than once in your query if needed. hope this will help.