Search code examples
mysqlmariadbaggregates

How do I subtract aggregated counts based on different states two columns in one table?


Using MariaDB, I am trying to get a monthly total of items that were created minus items that were deleted that month, for each month. If no items were deleted, the total should be just the number of items that were created that month. If more items were deleted than created, the total should be a negative number.

The table has a created_at column which is never null and a deleted_at column which is set once the item has been 'deleted'

To illustrate, the (simplified) schema is like this: TABLE Items:

+----------------------------------------------------------------------------+
|   idItem  | created_at                     | deleted_at                    |
+----------------------------------------------------------------------------+
|  1        |  2020-03-20T04:28:41.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
|  2        |  2020-03-20T04:28:41.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
|  3        |  2021-03-02T21:39:10.000+00:00 |  ∅                            |
|  4        |  2021-03-05T21:13:13.000+00:00 |  ∅                            |
|  5        |  2021-06-08T13:49:11.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
|  6        |  2021-07-13T02:36:05.000+00:00 |  ∅
|  7        |  2021-09-17T21:12:13.000+00:00 |  ∅                            |
+----------------------------------------------------------------------------+

The information I need is the monthly total that have not been deleted, like so:

+-----------------------------------+
|   total_existing  | during_month  |     
+-----------------------------------+
|    2              | 2020-03       | -- two were added
+-----------------------------------+
|    4              | 2021-03       | -- another two were created
+-----------------------------------+
|    5              | 2021-06       | -- another was added
+-----------------------------------+
|    3              | 2021-07       | -- three deleted, one added
+-----------------------------------+
|    4              | 2021-09       | -- one added
+-----------------------------------+

Ultimately, I need to display the total for each month. I've tried this but it's not right.

SELECT
  count(created.idItem) AS monthly_created_count,
  count(deleted.idItem) AS monthly_deleted_count,
  count(created.idItem) - count(deleted.idItem) as total,
  DATE_FORMAT(created.created_at, '%Y-%m') as created_month ,
  DATE_FORMAT(deleted.deleted_at, '%Y-%m') as deleted_month 
FROM 
    Item created 
  LEFT JOIN 
    Item deleted 
  ON 
  DATE_FORMAT(deleted.deleted_at, '%Y-%m') = DATE_FORMAT(created.created_at, '%Y-%m')
GROUP BY DATE_FORMAT(created.created_at, '%Y-%m'), DATE_FORMAT(deleted.deleted_at, '%Y-%m')

I keep thinking I'm so close, but when we look at the rows where the deleted_at dates are set, it's obvious I'm off the mark.


Solution

  • If you're looking for a cumulative total of rows created/deleted, one approach is COUNT the number of records created and deleted by month/year separately. Then join the counts together with UNION ALL and calculate the sum totals:

    SELECT t.YearMonth
          , SUM(t.TotalCreated) - SUM(t.TotalDeleted) AS TotalExisting
    FROM  (
             SELECT DATE_FORMAT(created_at, '%Y-%m') AS YearMonth
                    , COUNT(*) AS TotalCreated
                    , 0 AS TotalDeleted
             FROM   Item
             GROUP BY DATE_FORMAT(created_at, '%Y-%m')
             
             UNION ALL
    
             SELECT DATE_FORMAT(deleted_at, '%Y-%m') AS YearMonth
                    , 0 AS TotalCreated 
                    , COUNT(*) AS TotalDeleted
             FROM   Item
             WHERE  deleted_at IS NOT NULL
             GROUP BY DATE_FORMAT(deleted_at, '%Y-%m')
    ) t         
    GROUP BY t.YearMonth
    ORDER BY t.YearMonth
    

    Results:

     YearMonth | TotalExisting
     :-------- | ------------:
     2020-03   |             2
     2021-03   |             2
     2021-06   |             1
     2021-07   |            -2
     2021-09   |             1
    

    Then wrap those statements in a CTE and use a Window Function to calculate the rolling total:

    See also db<>fiddle

    WITH cte AS (        
        SELECT t.YearMonth
               , SUM(t.TotalCreated) - SUM(t.TotalDeleted) AS TotalExisting
        FROM  (
                 SELECT DATE_FORMAT(created_at, '%Y-%m') AS YearMonth
                       , COUNT(*) AS TotalCreated
                       , 0 AS TotalDeleted
                 FROM   Item
                 GROUP BY DATE_FORMAT(created_at, '%Y-%m')
                 UNION ALL
                 SELECT DATE_FORMAT(deleted_at, '%Y-%m') AS YearMonth
                        , 0 AS TotalCreated 
                        , COUNT(*) AS TotalDeleted
                 FROM   Item
                 WHERE  deleted_at IS NOT NULL
                 GROUP BY DATE_FORMAT(deleted_at, '%Y-%m')
         ) t         
         GROUP BY t.YearMonth
         ORDER BY t.YearMonth
    )
    SELECT YearMonth, SUM(TotalExisting) OVER (ORDER BY YearMonth) AS TotalExisting
    FROM   cte;
    

    Final Results:

    YearMonth | TotalExisting
    :-------- | ------------:
    2020-03   |             2
    2021-03   |             4
    2021-06   |             5
    2021-07   |             3
    2021-09   |             4