Search code examples
mysqlgroup-bysumpartition-by

how to sum value by DATE_FORMAT(date,'%Y-%m-%d'),the limit 20 sum(day by day), and sum remaining value if greater than 20?


How to sum value by DATE_FORMAT(date,'%Y-%m-%d') and id the first 20 rows of data(order by data desc), and sum the remaining value if greater than 20 else 0? supposed I have below data with below SQL, how to do it? thanks so much for any advice.

SELECT SUM(value), id, DATE_FORMAT(date,'%Y-%m-%d') 
FROM test_table
GROUP BY id, DATE_FORMAT(date,'%Y-%m-%d')
sum(value)  id  DATE_FORMAT(date,'%Y-%m-%d')
-------------------------------------------- 
64.98   123456  2021-01-01
64.98   123456  2021-01-02
64.98   123456  2021-01-03
64.98   123456  2021-01-04
64.98   123456  2021-01-05
64.98   123456  2021-01-06
72.34   123456  2021-01-07
64.98   123456  2021-01-08
64.98   123456  2021-01-09
103.80  123456  2021-01-10
64.98   123456  2021-01-11
64.98   123456  2021-01-12
64.98   123456  2021-01-13
64.98   123456  2021-01-14
64.98   123456  2021-01-15
64.98   123456  2021-01-16
64.98   123456  2021-01-17
64.98   123456  2021-01-18
64.98   123456  2021-01-19
64.98   123456  2021-01-20
64.98   123456  2021-01-21
64.98   123456  2021-01-22
64.98   123456  2021-01-23
64.98   123456  2021-01-24
144.98  123456  2021-01-25
64.98   123456  2021-01-26
64.98   123456  2021-01-27
64.98   123456  2021-01-28
64.98   123456  2021-01-29
64.98   123456  2021-01-30
64.98   123456  2021-01-31
64.98   123456  2021-02-01
64.98   123456  2021-02-02
64.98   123456  2021-02-03
64.98   123456  2021-02-04
720.92  123456  2021-02-05
66.98   123456  2021-02-06
66.98   123456  2021-02-07
66.98   123456  2021-02-08
64.98   123456  2021-02-09
64.98   123456  2021-02-10
64.98   223456  2021-01-20
54.98   223456  2021-01-21

...................................

I need the output like below:

  id      day_1    day_2   .....  day_20   day_other 
    -------------------------------------------- 
   123456  64.98   64.98  .....  234.67      2342.12  
   223456  64.98   64.98  .....  234.67      2342.12   

Solution

  • Note: Unfortunately, part of the component in this solution won't work on older MySQL version; as OP mentioned in her previous question, she is using MySQL 5.7.

    I see, in that case you can upgrade the query a bit. Let's consider that maybe the condition will change somehow to first 30 rows of data, then you might want to use prepared statement so that you don't need to keep on changing the query.

    1. I'll be using 3 variables for this example:
      a. @columns : for generating the columns SUM(CASE ...
      b. @sql : for the final query.
      c. @maxrows : to define the "first of xx rows".

    2. All the variables will be set as:

    SET @columns := NULL;
    SET @sql := NULL;
    SET @maxrows := XX; /*how many first rows*/
    

    Setting @columns and @sql as NULL is just a precaution, and @maxrows is just the rows you required to pivot, in your current situation it should be SET @maxrows := 20;

    1. Setting the @columns variables.
    WITH RECURSIVE cte AS (
    SELECT 1 seq UNION ALL
    SELECT seq+1 FROM cte WHERE seq+1 <= @maxrows)
    SELECT GROUP_CONCAT(cols SEPARATOR ', \r\n') 
                 INTO @columns
    FROM (
    SELECT CONCAT('SUM(CASE WHEN rn =',seq,' THEN vals ELSE 0 END) AS "day_',seq,'"') 
             AS cols
    FROM cte UNION
    SELECT CONCAT('SUM(CASE WHEN rn >',seq,' THEN vals ELSE 0 END) AS "day_others"')
    FROM cte WHERE seq=@maxrows) c;
    

    There are two components here, first is using common table expression to generate row sequences based on @maxrows then the second one is concatenating the SUM(CASE .. that you have in your SELECT query. The last condition of WHEN rn > 20 is generated separately and combined using UNION with the first 20 SUM(CASE ... Lastly, the GEOUP_CONCAT() is necessary to make the generated syntax into one single row so that it can be assign to the @columns variable. Note: setting variable will only be valid if it's a single row. Otherwise, you'll get Subquery returns more than 1 row error.
    Run this query to check the @columns variable afterward: SELECT @columns;

    1. Setting the @sql variable:
    SELECT CONCAT('SELECT id,',@columns,'
    FROM
    (SELECT SUM(value) AS vals, id, DATE_FORMAT(date,"%Y-%m-%d") AS dt,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
    FROM test_table
    GROUP BY id, date) v
    GROUP BY id;') INTO @sql;
    

    This is basically just concatenating the subquery that you already have with @columns variable and assign it to @sql variable. Check the variable: SELECT @sql;

    1. The last part is just preparing the statement based on @sql, execute then deallocate it:
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    

    Demo fiddle with separated components and checkings..

    Demo fiddle with final forms.

    P/S: You should be able to copy this query and directly run it however if there's an error (other than the possible table or column names), then it's probably caused by the group_concat_max_len setting. If so, you can check the setting by running:

    SHOW VARIABLES like 'group_concat_max_len';
    

    If it shows 1024 (in bytes), that means it's the default setting and with the operation we're going with our GROUP_CONCAT() here, it's not going to work out. Therefore, you might want to either permanently increase the value or you can add this syntax before your full query:

    SET SESSION group_concat_max_len = 1000000;