Search code examples
sqlpostgresqldenormalization

How to calculate a non-inflated SUM from a denormalized table


this is kind of building off a previous question I asked. Suppose I have a denormalized table that looks something like this:

    Apple_ID | Tree_ID | Orchard_ID | Tree_Height | ...other columns...
   ---------------------------------------------------------------------
           1 |       1 |          1 |          12 |  ...other values...
   ---------------------------------------------------------------------
           2 |       1 |          1 |          12 |  ...other values...
   ---------------------------------------------------------------------
           3 |       1 |          1 |          12 |  ...other values...
   ---------------------------------------------------------------------
           4 |       2 |          1 |          15 |  ...other values...
   ---------------------------------------------------------------------
           5 |       2 |          1 |          15 |  ...other values...
   ---------------------------------------------------------------------
           6 |       2 |          1 |          15 |  ...other values...
   ---------------------------------------------------------------------
           7 |       2 |          1 |          15 |  ...other values...
   ---------------------------------------------------------------------
           8 |       3 |          1 |          20 |  ...other values...
   ---------------------------------------------------------------------
           9 |       3 |          1 |          20 |  ...other values...
   ---------------------------------------------------------------------
          10 |       4 |          2 |          30 |  ...other values...
   ---------------------------------------------------------------------
          11 |       5 |          2 |          10 |  ...other values...
   ---------------------------------------------------------------------
          12 |       5 |          2 |          10 |  ...other values...
   ---------------------------------------------------------------------
          13 |       5 |          2 |          10 |  ...other values...
   ---------------------------------------------------------------------

I want to calculate the sum of Tree_Heights in each orchard, so the result I want to get back is:

Orchard_ID | sum(Tree_Height)
-------------------------------
         1 |               47
-------------------------------
         2 |               40

However, due to the denormalization, the sum inflates to this:

Orchard_ID | sum(Tree_Height)
-------------------------------
         1 |              136
-------------------------------
         2 |               60

The solution from the question I mentioned before cannot be implemented here, since we cannot sum() by a unique row column. How can I write a simple query to get the intended result?


Solution

  • The easiest way to write this is with a CTE, but if your system doesn't support that you can use a derived table. We use ROW_NUMBER() over the Apple_ID for each tree to give us a unique row to sum:

    SELECT "Orchard_ID", SUM("Tree_Height") AS Total_Height
    FROM (
      SELECT "Orchard_ID", "Tree_Height", 
             ROW_NUMBER() OVER (PARTITION BY "Orchard_ID", "Tree_ID" ORDER BY "Apple_ID") AS rn
      FROM data
    ) d
    WHERE rn = 1
    GROUP BY "Orchard_ID"
    

    Output

    Orchard_ID  total_height
    1           47
    2           40
    

    If you could use CTEs this is how it would be written:

    WITH CTE AS (
      SELECT "Orchard_ID", "Tree_Height", 
             ROW_NUMBER() OVER (PARTITION BY "Orchard_ID", "Tree_ID" ORDER BY "Apple_ID") AS rn
      FROM data
    )
    SELECT "Orchard_ID", SUM("Tree_Height") AS Total_Height
    FROM CTE
    WHERE rn = 1
    GROUP BY "Orchard_ID"
    

    Demo on SQLFiddle