Search code examples
sqlpostgresqldenormalization

How to calculate a non-inflated count from a denormalized table


Suppose I have a denormalized table that includes an ID and a value that I need to count. Something like this:

 Tree_ID | ...other columns... |  Count_If_True
------------------------------------------------
       1 | ...other values...  |           True
       1 | ...other values...  |           True
       2 | ...other values...  |           True
       2 | ...other values...  |           True
       3 | ...other values...  |           True

In this case, select Tree_ID, count(Count_If_True) from Table group by Tree_ID would show:

 Tree_ID |  count(Count_If_True)
---------------------------------
       1 |                     2
       2 |                     2
       3 |                     1

But If I denormalize my table further with a join from an Apples table (where every tree has multiple apples), it would look something like this:

Apple_ID | Tree_ID | ...other columns... |  Count_If_True
------------------------------------------------
       1 |       1 | ...other values...  |           True
       2 |       1 | ...other values...  |           True
       3 |       1 | ...other values...  |           True
       4 |       1 | ...other values...  |           True
       5 |       1 | ...other values...  |           True
       6 |       1 | ...other values...  |           True
       7 |       2 | ...other values...  |           True
       8 |       2 | ...other values...  |           True
       9 |       2 | ...other values...  |           True
      10 |       2 | ...other values...  |           True
      11 |       2 | ...other values...  |           True
      12 |       2 | ...other values...  |           True
      13 |       2 | ...other values...  |           True
      14 |       2 | ...other values...  |           True
      15 |       3 | ...other values...  |           True
      16 |       3 | ...other values...  |           True
      17 |       3 | ...other values...  |           True
      18 |       3 | ...other values...  |           True
      19 |       3 | ...other values...  |           True

This would inflate our count to:

 Tree_ID |  count(Count_If_True)
---------------------------------
       1 |                     6
       2 |                     8
       3 |                     5

Is there a simple way (without a CTE, for example) to write a single query to get back the original count result before Apple_IDs were introduced?


Solution

  • You need a distinct row identifier in the first table -- perhaps that is among the other columns. It can be one or more columns. Then you can use count(distinct):

    select tree_id,
           count(distinct <unique row column>) filter (where count_if_true)
    from t
    group by tree_id;