Search code examples
sqlwindow-functionscartesian-product

Enumerating and correlating tables to avoid cartesian joins


I have a set of tables that I want to pull all the data from, with columns listing the contents of the ancillary tables.

For example:

states

code name
AR Arkansas
CT Connecticut
MN Maine

senators

code name
AR Katie Britt
AR Tommy Tuberville
CT Chris Murphy
CT Richard Blumenthal
MN Angus King
MN Susan Collins

representatives

code name
AR Bruce Westerman
AR French Hill
AR Rick Crawford
AR Steve Womack
CT Jahana Hayes
CT Jim Himes
CT Joe Courtney
CT John B. Larson
CT Rosa DeLauro
MN Chellie Pingree
MN Jared Golden

I want a column for the state, a column for the senators, and a column for the representatives like this:

State Senator Representative
Arkansas Katie Britt Bruce Westerman
Arkansas Tommy Tuberville French Hill
Arkansas Rick Crawford
Arkansas Steve Womack
Connecticut Chris Murphy Jahana Hayes
Connecticut Richard Blumenthal Jim Himes
Connecticut Joe Courtney
Connecticut John B. Larson
Connecticut Rosa DeLauro
Maine Angus King Chellie Pingree
Maine Susan Collins Jared Golden

What I've come up with so far is this:

select states.name state, sens.name senator, reps.name representative
from states
left join
( select state_code, name
  , row_number() over (partition by state_code order by name) rn
  from representatives
) reps on reps.state_code = states.code 
full outer join 
( select state_code, name
  , row_number() over (partition by state_code order by name) rn
  from senators
) sens on sens.state_code = states.code and sens.rn = reps.rn

(apologies, I earlier posted an incorrect version with the two joins the wrong way around)

That works, but it relies on me joining to representatives first and then to senators, since there are always more reps. It breaks down if the representatives list is not populated for a state or if there are fewer rows populated than for senators.

My real world requirement is not states and senators and representatives, but this is an easy-to-explain analogy. The second and third tables are actually all in one table, and I need to do a dozen or more joins to it, not just two.

Any ideas how I can make this more general to handle the scenario where the second table in the from-join clause has fewer rows than the third, or zero?

UPDATE WITH SOLUTION

Thanks to Chris Maurer for getting me further along the track. I am now trying to extend this to a third ancillary table.

governors

code name
MN Janet Mills
select states.name as state, g.name as governor, s.name as senator, r.name as representative
from states
left join 
(
 (select g.*
  , row_number() over (partition by state_code order by name) as nbr
  from governors g) g
 full outer Join
 (Select s.*
  , row_number() over (partition by state_code order by name) as nbr
  from senators s) s
 on s.state_code=g.state_code and s.nbr=g.nbr
 full outer join
 (select r.*
  , row_number() over (partition by state_code order by name) as nbr
  from representatives r) r
 on r.state_code=coalesce(g.state_code,s.state_code) and r.nbr=coalesce(g.nbr,s.nbr)
) on states.code = coalesce(r.state_code,s.state_code,g.state_code)
State Governor Senator Representative
Arkansas Katie Britt Bruce Westerman
Arkansas Tommy Tuberville French Hill
Arkansas Rick Crawford
Arkansas Steve Womack
Connecticut Chris Murphy Jahana Hayes
Connecticut Richard Blumenthal Jim Himes
Connecticut Joe Courtney
Connecticut John B. Larson
Connecticut Rosa DeLauro
Maine Janet Mills Angus King Chellie Pingree
Maine Susan Collins Jared Golden

I think that works. I was going to ask something, but I fixed it by adding a coalesce to the on clause of the r.state join.


Solution

  • You are on the right track. Number your base tables first, full outer join them together second, and join in state name third.

    Select st.name as state, s.name as senator, r.name as representative
    From (
        (Select *
            , row_number() Over (Partition by code) as nbr
         From senators) s
      Full Outer Join
        (Select *
            , row_number() Over (Partition by code) as nbr
         From representatives) r
      On s.code=r.code and s.nbr=r.nbr
    ) Inner Join states st On Coalesce(r.code,s.code)=st.code