Search code examples
mysqlsqljoinduplicatescross-join

Duplicate pairs in SQL cross join?


I know there are a lot of similar questions to this out there on SO, and I looked at some of them, namely this, but its not enough to solve my problem

I have a table proposals that looks something like this:

proposalNum | PI        | department | investigator | investigatorDepartment|
---------------------------------------------------------------------------- 
FP00003521  | Bush,Raj  | GIS        | Amm,Anna     | CIS                   |
FP00003521  | Bush,Raj  | GIS        | Milton,Ross  | CIS                   |
FP00003521  | Bush,Raj  | GIS        | Landis, Amy  | SEB                   |
FP00069606  | Mill, Ash | DIA        | Keller, Bill | FAA                   |

and I basically want to look at the investigator field and create:

Amm,Anna     | Milton, Ross 
Amm,Anna     | Landis, Amy
Milton,Ross  | Landis, Amy 

for all of the investigators that have CIS, and for a given proposalNum(in this case, FP00003521)

(Here is the SQL FIDDLE)

I did this:

SELECT p1.investigator,p2.investigator
FROM proposals AS p1
CROSS JOIN proposals AS p2 
WHERE p1.investigatorDepartment='CIS' 
AND p1.investigator!=p2.investigator
AND p1.proposalNum=p2.proposalNum

and got:

Milton,Ross  | Amm,Anna 
Amm,Anna     | Milton,Ross
Amm,Anna     | Landis, Amy
Milton,Ross  | Landis,Amy

But there's a repeat there...

Following the logic on the linked post, I also tried:

SELECT p1.investigator,p2.investigator
FROM proposals AS p1
CROSS JOIN proposals AS p2 
ON p1.investigator < p2.investigator
WHERE p1.investigatorDepartment='CIS' 
AND p1.investigator!=p2.investigator
AND p1.proposalNum=p2.proposalNum

and I got this:

Amm,Anna     | Milton,Ross
Amm,Anna     | Landis, Amy

The problem is when I add the ON p1.investigator < p2.investigator condition, it also won't add Milton,Ross and Landis, Amy since M > L

How would I re-write my query to solve this problem?

Any help would be greatly appreciated, thank you!!


Solution

  • You can join those p2's that are CIS differently to those that aren't. One way is as follows.

    select
      p1.investigator,
      p2.investigator
    from
      proposals AS p1
        cross join
      proposals AS p2 
    where
      p1.investigatorDepartment = 'CIS' and ((
          p2.investigatorDepartment = 'CIS' and 
          p1.investigator < p2.investigator
      ) or (
          p2.investigatorDepartment != 'CIS' or
          p2.investigatorDepartment is null
      )) and
      p1.proposalNum = p2.proposalNum;
    

    SQL Fiddle