Search code examples
sqlsql-servert-sqlgroup-bysql-server-2016

TSQL GroupBy value every X seconds


I have this query:

DECLARE @NUM AS INT   -- IT INDICATE THE NUMBER OF SECONDS TO GROUP THE DATA
DECLARE @DATA_START AS DATETIME = DATEADD(yy, DATEDIFF(yy, 0, GETDATE()), 0)
DECLARE @DATA_END AS DATETIME = GETDATE()

SELECT AVG(D.VALORE)
FROM AA_V_DATI_MISURAZIONE D
WHERE D.STARTDATE >= @DATA_START AND D.ENDDATE <= @DATA_END

Now I need to extra the average of D.Valore grouped by every @NUM seconds.

Edit: so this is my query:

SELECT 
    DATEADD(day, DATEDIFF(SECOND, @DATA_START, D.DataOra) / @NUM, @DATA_START) AS DATAORA,
    AVG(D.Valore) AS MEDIAVALORE
FROM 
    AA_V_DatiMisurazione D 
WHERE 
    D.STARTDATE >= @DATA_START 
    AND D.STARTDATE <= @DATA_END 
GROUP BY 
    DATEDIFF(SECOND, @DATA_START, D.DataOra) / @NUM
ORDER BY 
    DATAORA

Solution

  • Honestly, the simpliest method would be to just use DATE_BUCKET (assuming you are using the latest version of SQL Server, which you don't say you aren't):

    SELECT DATE_BUCKET(SECOND, @NUM, STARTDATE) AS Second_Bucket,
           AVG(D.VALORE) AS AvgValore
    FROM AA_V_DATI_MISURAZIONE D
    WHERE D.STARTDATE >= @DATA_START AND D.ENDDATE <= @DATA_END
    GROUP BY DATE_BUCKET(SECOND, @NUM, STARTDATE);
    

    Note, however that for buckets that don't fit well into a minute, you might get undesired values. For example see the following:

    DECLARE @DT datetime2(0) = '2023-07-01T00:00:04';
    SELECT DATE_BUCKET(SECOND,7,@DT);
    

    This results in the value 2023-06-30 23:59:58. If you want the value taken from the start of the day, you could would need to pass a value for the function's 4th parameter, to define when a bucket starts. This might seem odd but use DATE_BUCKET again to truncate to the start of the date, as DATE_BUCKET requires the 3rd and 4th parameters have the same data type, and DATE_BUCKET retains the data type:

    DECLARE @DT datetime2(0) = '2023-07-01T00:00:04';
    
    SELECT DATE_BUCKET(SECOND,7,@DT, DATE_BUCKET(DAY,1,@dt));
    

    This now returns 2023-07-01 00:00:00 and also means that the bucket that starts at 23:59:58 would only be for 23:59:58 and 23:59:59.


    If you are using an older version, you can use some DATEADD/DATEDIFF and integer math "magic":

    SELECT DATEADD(SECOND, DATEDIFF(SECOND, '20000101', STARTDATE) / @Num * @Num, '20000101') AS Second_Bucket,
           AVG(D.VALORE) AS AvgValore
    FROM AA_V_DATI_MISURAZIONE D
    WHERE D.STARTDATE >= @DATA_START AND D.ENDDATE <= @DATA_END
    GROUP BY DATEADD(SECOND, DATEDIFF(SECOND, '20000101', STARTDATE) / @Num * @Num, '20000101');
    

    This solution does not address the bucketing issue for buckets that don't "fit nicely". You could, however, achieve this why replacing the arbitrary date with your column CAST/CONVERTed to a date.