Search code examples
summaxwindow-functionsminmysql-8.0

Calculate sum of maximum and minimum value for repeated group of entries in MySQL8


LogDateAndTime BatchDate TagLetter Totaliser ExpectedResult
10-11-2020 09:06:14 10-11-2020 08:29:55 A 6319 31
10-11-2020 09:06:24 10-11-2020 08:29:55 A 6337 31
10-11-2020 09:08:14 10-11-2020 08:29:55 B 6355 31
10-11-2020 09:08:24 10-11-2020 08:29:55 B 6372 31
10-11-2020 09:08:34 10-11-2020 08:29:55 B 6378 31
10-11-2020 09:08:44 10-11-2020 08:29:55 A 6383 31
10-11-2020 09:09:14 10-11-2020 08:29:55 A 6388 31
10-11-2020 09:09:24 10-11-2020 08:29:55 A 6396 31
10-11-2020 09:09:34 10-11-2020 08:29:55 B 6409 31
10-11-2020 09:09:44 10-11-2020 08:29:55 B 6426 31
10-11-2020 09:10:24 10-11-2020 08:29:55 B 6442 31

The above table has LogDateAndTime(Primary_Key) column with unique datetime entries. The BatchDate column holds same datetime values throughout the entire batch. I need to calculate the sum of MAX(Totaliser)-MIN(Totaliser) for each instance of TagLetter=A so that i should ignore values in TagLetter=B. My ExpectedResult in this case will be SUM[(6337-6319)+(6396-6383)]=31. I tried below query, but did not get the expected result.

SELECT SUM(
           CASE 
             WHEN TagLetter='A' THEN MAX(Totaliser)-MIN(Totaliser) 
             ELSE 0.0 
           END
          ) OVER (PARTITION BY BatchDate) AS ExpectedResult

In this case, it is calculating 6396-6319=77 which is not the expected result. Could someone help me to get the result right?


Solution

  • First create the groups of consecutive 'A's with window functions LAG() and SUM() and then aggregate on these groups:

    WITH cte AS (
      SELECT DISTINCT SUM(MAX(Totaliser) - MIN(Totaliser)) OVER () ExpectedResult
      FROM (
        SELECT *, SUM(flag) OVER (ORDER BY LogDateAndTime) grp
        FROM (
          SELECT *, LAG(TagLetter, 1, '') OVER (ORDER BY LogDateAndTime) <> 'A' flag
          FROM tablename 
        ) t
        WHERE TagLetter = 'A'
      ) t
      GROUP BY grp
    )
    SELECT t.*, c.ExpectedResult
    FROM tablename t CROSS JOIN cte c
    

    Or if you want results for each BatchDate:

    WITH cte AS (
      SELECT DISTINCT BatchDate,
             SUM(MAX(Totaliser) - MIN(Totaliser)) OVER () ExpectedResult
      FROM (
        SELECT *, SUM(flag) OVER (PARTITION BY BatchDate ORDER BY LogDateAndTime) grp
        FROM (
          SELECT *, LAG(TagLetter, 1, '') OVER (PARTITION BY BatchDate ORDER BY LogDateAndTime) <> 'A' flag
          FROM tablename 
        ) t
        WHERE TagLetter = 'A'
      ) t
      GROUP BY BatchDate, grp
    )
    SELECT t.*, c.ExpectedResult
    FROM tablename t LEFT JOIN cte c
    ON c.BatchDate = t.BatchDate
    

    See the demo.
    Results:

    > LogDateAndTime      | BatchDate           | TagLetter | Totaliser | ExpectedResult
    > :------------------ | :------------------ | :-------- | --------: | -------------:
    > 10-11-2020 09:06:14 | 10-11-2020 08:29:55 | A         |      6319 |             31
    > 10-11-2020 09:06:24 | 10-11-2020 08:29:55 | A         |      6337 |             31
    > 10-11-2020 09:08:14 | 10-11-2020 08:29:55 | B         |      6355 |             31
    > 10-11-2020 09:08:24 | 10-11-2020 08:29:55 | B         |      6372 |             31
    > 10-11-2020 09:08:34 | 10-11-2020 08:29:55 | B         |      6378 |             31
    > 10-11-2020 09:08:44 | 10-11-2020 08:29:55 | A         |      6383 |             31
    > 10-11-2020 09:09:14 | 10-11-2020 08:29:55 | A         |      6388 |             31
    > 10-11-2020 09:09:24 | 10-11-2020 08:29:55 | A         |      6396 |             31
    > 10-11-2020 09:09:34 | 10-11-2020 08:29:55 | B         |      6409 |             31
    > 10-11-2020 09:09:44 | 10-11-2020 08:29:55 | B         |      6426 |             31
    > 10-11-2020 09:10:24 | 10-11-2020 08:29:55 | B         |      6442 |             31