Search code examples
mysqlduplicatesuniquedistinctself-join

mysql match duplicates on one column and ONLY different on another column


I have a mysql table (we'll call it 'user') with about 700,000 rows of data. Let's say here is a small sample:

UserID  DeptID  Name    Score
1       2       Bob     50
2       2       Bob     100
3       3       Jane    32
4       4       Jill    93
5       2       Bob     50
6       3       Jane    58
7       7       Jane    44

I want to show all the rows with a duplicate 'Name' AND 'DeptID', BUT ONLY different 'Score'. (UserID is irrelevant for selection, it just needs to be displayed.)

UserID  DeptID  Name    Score
1       2       Bob     50
2       2       Bob     100
3       3       Jane    32
6       3       Jane    58

So basically as you can see from above, I do not want to show the other Bob (UserID 5) with a Score of 50 (even though he is in Dept 2).

I have part of the query working as shown below, except it shows all the Bob's and I don't know how to get it to only show where the score's are different.

SELECT A.UserID, A.DeptID, A.Name, A.Score
FROM User AS A
INNER JOIN (
    SELECT Name, DeptID
    FROM User
    GROUP BY Name, DeptID
    HAVING COUNT(*) > 1
) AS B ON A.Name = B.Name AND A.DeptID = B.DeptID

This query runs within a decent amount of time. Everything else I've tried either gave incorrect results, a query error, or the query ran forever and I finally killed the process.

I know I'm missing something simple, but I'm having a hard time seeing what it is....

EDIT - Okay, Barmar's answer was what I was looking for, thanks! (Guess I was just staring at my SQL statement for too long, lol)

Here's a new twist though. I only want it to show a particular someone if there are 2 (or more) different scores (within the same DeptID). So for instance if there were only two Bob's that both had 50 (the 100 didn't exist), neither would be displayed.


Solution

  • Add GROUP BY A.Name, A.Score to the end so you only get one row for each combination of name and score.

    And if you only want to show them if there are two different scores within the same department, use COUNT(DISTINCT Score) instead of COUNT(*).

    SELECT A.UserID, A.DeptID, A.Name, A.Score
    FROM User AS A
    INNER JOIN (
        SELECT Name, DeptID
        FROM User
        GROUP BY Name, DeptID
        HAVING COUNT(DISTINCT Score) > 1
    ) AS B ON A.Name = B.Name AND A.DeptID = B.DeptID
    GROUP BY A.Name, A.Score