Search code examples

Duplicates in Database, Help Edit My Query to Filter Them Out?

I have just finished my latest task of creating an RSS Feed using PHP to fetch data from a database.

I've only just noticed that a lot (if not all) of these items have duplicates and I was trying to work out how to only fetch one of each.

I had a thought that in my PHP loop I could only print out every second row to only have one of each set of duplicates but in some cases there are 3 or 4 of each article so somehow it must be achieved by the query.


FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
        ON = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
        ON t2.newsPhotoID =
ORDER BY t1.publishDate DESC;

Table Structures:

id | headline | extract | text | publishDate | ...

id | newsArticleID | newsPhotoID

id | htmlAlt | URL | height | width | ...

For some reason or another there are lots of duplicates and the only thing truely unique amongst each set of data is the because even though uk_newsreach_article_photo.newsArticleID and uk_newsreach_article_photo.newsPhotoID are identical in a set of duplicates, all I need is one from each set, e.g.

Sample Data

id | newsArticleID | newsPhotoID
 2 |     800482746 |     7044521
10 |     800482746 |     7044521
19 |     800482746 |     7044521
29 |     800482746 |     7044521
39 |     800482746 |     7044521
53 |     800482746 |     7044521
67 |     800482746 |     7044521

I tried sticking a DISTINCT into the query along with specifying the actual columns I wanted but this didn't work.


  • As you have noticed, the DISTINCT operator will return every id. You could use a GROUP BYinstead.

    You will have to make a decision about wich id you want to retain. In the example, I have used MINbut any aggregate function would do.

    SQL Statement

    SELECT MIN(, t2.newsArticleID, t2.newsPhotoID 
    FROM uk_newsreach_article t1
        INNER JOIN uk_newsreach_article_photo t2
            ON = t2.newsArticleID
        INNER JOIN uk_newsreach_photo t3
            ON t2.newsPhotoID =
    GROUP BY t2.newsArticleID, t2.newsPhotoID 
    ORDER BY t1.publishDate DESC;


    Now while this would be an easy solution to your immediate problem, if you decide that duplicates should not happen, you really should consider redesigning your tables to prevent duplicates getting into your tables in the first place.