Search code examples
mysqljoinmany-to-manyduplicatesrelational

MySQL query is returning duplicates when it shouldn't


I have three tables:

Person

+--------+-----------+  
| fName  | lName     |  
+--------+-----------+  
| Paul   | McCartney |  
| John   | Lennon    |  
| Jon    | Stewart   |  
| Daniel | Tosh      |  
| Steven | Colbert   |  
| Pink   | Floyd     |  
| The    | Beatles   |  
| Arcade | Fire      |  
| First  | Last      |  
| Andrew | Bird      |  
+--------+-----------+  

Publication

+----+---------------------------------------+------+-----------+---------+  
| id | title                                 | year | pageStart | pageEnd |  
+----+---------------------------------------+------+-----------+---------+  
|  9 | The Dark Side of the Moon             | 1973 |         0 |       0 |  
| 10 | Piper At The Gates of Dawn            | 1967 |         0 |       0 |  
| 11 | Sgt. Pepper's Lonely Hearts Band Club | 1967 |         0 |       0 |  
| 12 | Happy Thoughts                        | 2007 |         0 |      60 |  
| 13 | Wish You Were Here                    | 1975 |         0 |       0 |  
| 14 | Funeral                               | 2004 |         0 |       0 |  
+----+---------------------------------------+------+-----------+---------+  

Person_Publication

+-----------+----------------+--------+---------------+  
| person_id | publication_id | editor | author_number |  
+-----------+----------------+--------+---------------+  
|        11 |             11 |      0 |             1 |  
|        12 |             11 |      0 |             1 |  
|        16 |              9 |      0 |             1 |  
|        17 |             11 |      0 |             1 |  
+-----------+----------------+--------+---------------+  

I'm trying to select all authors of a certain publication using the following query:

SELECT fName , lName 
FROM Publication , Person, Person_Publication 
WHERE Person.id = Person_Publication.person_id 
AND Person_Publication.publication_id = 11;

But the results I get are always duplicates (always 6x for some reason). The results:

+-------+-----------+
| fName | lName     |
+-------+-----------+
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
| Paul  | McCartney |
| John  | Lennon    |
| The   | Beatles   |
+-------+-----------+
18 rows in set (0.03 sec)

Can somebody please tell me why this is happening and how to fix this?


Solution

  • You are including three tables in your query:

    FROM Publication, Person, Person_Publication 
    

    but you have only one join condition:

    WHERE Person.id = Person_Publication.person_id 
    

    You end up with a cartesian product between Publication and Person JOIN Person_Publication

    Add the following condition to your WHERE block:

    AND Publication.id = Person_Publication.publication.id
    

    A perfect example of why the explicit JOIN syntax is prefered. With the following syntax:

    SELECT fName, lName 
    FROM Publication
    JOIN Person_Publication ON Person_Publication.publication.id = Publication.id
    JOIN Person ON Person.id = Person_Publication.person_id
    WHERE Person_Publication.publication_id = 11;
    

    .. such a mistake simply cannot happen.