Search code examples
sqlinner-joindistinct

How to apply inner join and find distinct value from a different table?


I have 3 SQL tables.

  1. tableA
  2. tableB
  3. mappingAB // Contains mapping between tableA and tableB
db=> \d tableA
                                Table "public.tableA"
            Column          |           Type           | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+---------
 id_A                       | character varying(512)   |           | not null |

db=> \d tableB
                                Table "public.tableB"
              Column          |           Type           | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+---------
 id_B                         | character varying(512)   |           | not null |
 source                       | character varying(512)   |           | not null |

db=> \d mappingAB
                                Table "public.mappingAB"
            Column          |           Type           | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+---------
 id_A                       | character varying(512)   |           | not null |
 id_B                       | character varying(512)   |           | not null |

Now, I want to find the unique values of source from tableB that relates to a single row of tableA's id_A.

Data for example:

tableA

 id_A
---------
 insight1
 insight2
 insight3
 insight4
 insight5

mappingAB

id_A        | id_B 
------------+--------
 insight1   | obs1
 insight1   | obs2
 insight2   | obs1
 insight2   | obs2
 insight2   | obs3
 insight3   | obs1

tableB

id_B    | source 
--------+--------
 obs1   | AWS
 obs2   | GCP
 obs3   | GCP
Expected outcome (maps source from tableB to id_A in tableA uniquely)

id_A       | AWS    | GCP
-----------+--------+-----------------------
 insight1  | 1      | 1
 insight2  | 1      | 1
 insight1  | 1      | 0

I want to find how many unique source are related to an id_A from tableA.

I have tried running a query like this but that fails to return the correct results.

select obs.source, count(obs.source)
from tableB b inner join
mappingAB ab on ab.id_B = obs.id_B
group by obs.source

Solution

  • The Cte is to get id_A and source, then using group by and max to get the expected data :

    with cte as (
      select A.id_A, source
      from tableA A
      inner join mappingAB AB on AB.id_A = A.id_A
      inner join tableB B on AB.id_B = B.id_B
    )
    select id_A,
        MAX(case when source = 'AWS' then 1 else 0 end) as AWS,
        MAX(case when source = 'GCP' then 1 else 0 end) as GCP
    from cte
    group by id_A;
    

    For previous versions that are not supporting WITH :

    select A.id_A,
            MAX(case when source = 'AWS' then 1 else 0 end) as AWS,
            MAX(case when source = 'GCP' then 1 else 0 end) as GCP
    from tableA A
    inner join mappingAB AB on AB.id_A = A.id_A
    inner join tableB B on AB.id_B = B.id_B
    group by id_A
    

    Demo here