Search code examples
mysqldata-warehousecumulative-summysql-5.6

MySQL Calculate percentage change of sales between years based on SUM and GROUP BY


I have a data warehouse where the SELECT (and SUM) query has the following output.

+------+-----------+-------------+------------+
| YEAR | ITEM TYPE | TOTAL_ITEMS | TOTAL_COST |
+------+-----------+-------------+------------+
| 2009 | TYPE-1    |          19 |        330 |
| 2009 | TYPE-2    |           1 |         10 |
| 2009 | TYPE-3    |          11 |        190 |
| 2010 | TYPE-1    |          11 |        220 |
| 2010 | TYPE-2    |           7 |        230 |
| 2010 | TYPE-3    |           3 |        360 |
+------+-----------+-------------+------------+

My question is how to create a new column where to calculate the total cost difference between 2009 and 2010 (where 2009 is as a base) in percentage.

So the output will be something like this:

  +------+-----------+-------------+------------+----------+----------+
| YEAR | ITEM TYPE | TOTAL_ITEMS | TOTAL_COST | ItemDiff | CostDiff |
+------+-----------+-------------+------------+----------+----------+
| 2009 | TYPE-1    |          19 |        330 | 0%       | 0        |
| 2009 | TYPE-2    |           1 |         10 | 0%       | 0        |
| 2009 | TYPE-3    |          11 |        190 | 0%       | 0        |
| 2010 | TYPE-1    |          11 |        220 | -42.11%  | -33.33%  |
| 2010 | TYPE-2    |           7 |        230 | 1000%    | 2200%    |
| 2010 | TYPE-3    |           3 |        360 | -72.73%  | 80.47%   |
+------+-----------+-------------+------------+----------+----------+

The ITEM TYPE is a category which is composed from several items with different prices. I need to calculate changes per category not per item.

The query I've got so far is

SELECT
  date_dim.year,
  item_dim.item_type,
  SUM(fact.total_item)TotalItems,
  SUM(fact.total_cost) AS TotalCost 
FROM fact
  INNER JOIN date_dim
    ON fact.date_key = date_dim.date_key
  INNER JOIN item_dim
    ON fact.item_key = item_dim.item_key
WHERE date_dim.year BETWEEN 2009 AND 2011
GROUP BY date_dim.year,
         item_dim.item_type  

Please have a look at the following fiddle where the schema and the query are already built.

http://sqlfiddle.com/#!9/8e53c0/2

Here is the simplified ERD...

ERD

Thank you in advance for any help...


Solution

  • Here is a query how to achieve that:

    Using MySQL Common Table Expression (This doesn't run on sqlfiddle)

    WITH summary_table AS 
      (SELECT
        substr(date_dim.year,1,4) year,
        item_dim.item_type,
        SUM(fact.total_item) TotalItems,
        SUM(fact.total_cost) AS TotalCost 
      FROM fact
        INNER JOIN date_dim
          ON fact.date_key = date_dim.date_key
        INNER JOIN item_dim
          ON fact.item_key = item_dim.item_key
      WHERE date_dim.year BETWEEN 2009 AND 2011
      GROUP BY date_dim.year,
               item_dim.item_type) 
      SELECT  
         A.*, 
         CASE WHEN (A.TotalItems IS NULL OR B.TotalItems IS NULL OR B.TotalItems=0) THEN 0 ELSE
           (A.TotalItems - B.TotalItems)*100/B.TotalItems END AS ItemDiff,
         CASE WHEN (A.TotalCost IS NULL OR B.TotalCost IS NULL OR B.TotalCost=0) THEN 0 ELSE
            (A.TotalCost - B.TotalCost)*100/B.TotalCost END AS CostDiff
      FROM summary_table A LEFT JOIN summary_table B
       ON A.YEAR=(B.YEAR+1) AND A.ITEM_TYPE=B.ITEM_TYPE;
    

    Without CTE (See demo on SQL Fiddle)

    SELECT 
         A.*, 
         CASE WHEN (A.TotalItems IS NULL OR B.TotalItems IS NULL OR B.TotalItems=0) THEN 0 ELSE
           (A.TotalItems - B.TotalItems)*100/B.TotalItems END AS ItemDiff,
         CASE WHEN (A.TotalCost IS NULL OR B.TotalCost IS NULL OR B.TotalCost=0) THEN 0 ELSE
            (A.TotalCost - B.TotalCost)*100/B.TotalCost END AS CostDiff
    FROM (SELECT
        substr(date_dim.year,1,4) year,
        item_dim.item_type,
        SUM(fact.total_item)TotalItems,
        SUM(fact.total_cost) AS TotalCost 
      FROM fact
        INNER JOIN date_dim
          ON fact.date_key = date_dim.date_key
        INNER JOIN item_dim
          ON fact.item_key = item_dim.item_key
      WHERE date_dim.year BETWEEN 2009 AND 2011
      GROUP BY date_dim.year,
               item_dim.item_type) A LEFT JOIN (SELECT
        substr(date_dim.year,1,4) year,
        item_dim.item_type,
        SUM(fact.total_item)TotalItems,
        SUM(fact.total_cost) AS TotalCost 
      FROM fact
        INNER JOIN date_dim
          ON fact.date_key = date_dim.date_key
        INNER JOIN item_dim
          ON fact.item_key = item_dim.item_key
      WHERE date_dim.year BETWEEN 2009 AND 2011
      GROUP BY date_dim.year,
               item_dim.item_type) B
    ON A.YEAR=(B.YEAR+1) AND A.ITEM_TYPE=B.ITEM_TYPE;