Search code examples
sqlpostgresqlviewleft-joinaggregate

Create view with aggregated columns from three levels of nested tables


I'm using Postgres 9.4 and I have 3 tables: truck , container and container_metadata.
A truck can have many container and a container many container_metadata.

I'll add more description later with the tables below (I did my best creating them), so here goes:

enter image description here

A truck can contain many containers referred by key truck_id

enter image description here enter image description here

A container then is described by the container_metadata table, a container can also have many records of different types, but in this case, I will be focusing only in 'subcontainer' type.

The problem I'm trying to solve is to create a view for ease of querying, and model creation (to be used for databrowser). I'll add more details below.

enter image description here

In the last column, all good is true if a truck has:

  • At least one container.
  • A container should have at least one record of type subcontainer in the metadata.
  • Volume values of all subcontainers should be positive non zero.

More details:

  • If a truck has no container, then I should have a column to describe whether I need to create container for it.
  • If a truck has containers but no subcontainer, then I should also be able to tell I need to create subcontainer records for it.
  • If a truck has 10 containers, but only 6 of the containers have atleast one subcontainer, then it's still not good, and I should be able to tell that I still need to create a subcontainer.
  • If a truck's containers have sub containers, but one of the subcontainer has a volume of 0, again, it's not good, and I should also notify that we need to fill it .

I have only been able to count the containers, subcontainers I've been trying out aggregate functions, but I'm new to this problem, I hope the tables help. Please do ask if it needs more clarification.

Query

My attempt so far:

select 
t.id,t.name, count(c.id) as container_count, count(cm.id) as subcontainer_count
from
    public.truck t
left join
    public.container c
on
    c.truck_id = t.id
left join
    public.container_metadata cm
on
    cm.container_id = c.id and type = 'subcontainer'
group by t.id
;

Assuming schema is public. Above is the query I've tried but it outputs wrong number of subcontainers, and that's how far I've got.


Solution

  • Assuming container_metadata.volume to be NOT NULL DEFAULT 0, this should do the complete job:

    SELECT t.id, t.name
         , COALESCE(c.cont_ct, 0)      AS cont_ct
         , COALESCE(c.sub_ct, 0)       AS sub_ct
         , COALESCE(c.empty_ct, 0)     AS empty_ct
         , c.truck_id IS NULL          AS cont_missing
         , c.sub_missing IS NOT FALSE  AS sub_missing
         ,(c.empty_ct = 0) IS NOT TRUE AS sub_needfill
         , c.empty_ct = 0 AND NOT c.sub_missing AS all_good
    FROM   truck t
    LEFT   JOIN (
       SELECT truck_id
            , count(*) AS cont_ct
            , sum(cm.ct_sub) AS sub_ct
            , sum(cm.ct_empty) AS empty_ct
            , bool_or(cm.container_id IS NULL) AS sub_missing
       FROM   container c
       LEFT   JOIN (
          SELECT container_id
               , count(*) AS ct_sub
               , count(*) FILTER (WHERE volume = 0) AS ct_empty
          FROM   container_metadata
          WHERE  type = 'subcontainer'  -- only those seem relevant
          GROUP  BY 1
          ) cm ON cm.container_id = c.id
       GROUP  BY 1
       ) c ON c.truck_id = t.id;
    

    The major feature is to aggregate first and then join to the next upper level. Not only is it typically faster when processing most or all of the table anyway, it also allows to aggregate each level in one fell swoop:

    Be wary of null values. Those can be introduced by column values or by the LEFT JOIN (missing rows). It is essential to observe which columns can be null. The query is only valid for matching table definitions.

    Be sure to understand logical and comparison operators, in particular involving null values.

    The aggregate FILTER clause requires Postgres 9.4: