Search code examples
mysqlsqlgreatest-n-per-group

Get rid of duplicate rows in SQL query


First of all, this is a database I was given, and it is a pain to work with.

I have this table:

content_id version content_version_id alias publish_date contentattribute_name contentattribute_value
4507 4 10369 /aktuelt/arkiv/ 04/04/2014 category_names CAT1, CAT2
4348 3 10373 /publikasjoner/ 23/05/2011 category_names CAT1, CAT2
4348 4 10374 /publikasjoner/ 23/05/2011 category_names CAT1, CAT2
4439 3 10701 /publikasjoner/ 08/04/2015 category_names CAT1, CAT2
4439 4 10702 /publikasjoner/ 08/04/2015 category_names CAT1, CAT2

I am trying to get rid of duplicates from the column "content_id", and I've searched for answers on stackoverflow, but I couldn't get it to work with my SQL query.

Can anyone help me with the current query?

SELECT db_name.contentversion.ContentId as 'content_id',
db_name.contentversion.Version as 'version',
db_name.contentversion.ContentVersionId as 'content_version_id', 
db_name.content.alias as 'alias',
db_name.content.PublishDate as 'publish_date', 
db_name.contentattributes.Name as 'contentattribute_name', 
db_name.contentattributes.Value as 'contentattribute_value'
FROM db_name.contentversion
JOIN db_name.content
ON db_name.content.ContentId = db_name.contentversion.ContentId
JOIN db_name.contentattributes
ON db_name.contentattributes.ContentVersionId = db_name.contentversion.ContentVersionId

EDIT

I think I managed to solve it by using this query:

SELECT cv.ContentId, cv.Version, cv.ContentVersionId,
db_name.content.alias AS "alias",
db_name.content.PublishDate AS "publish_date", 
db_name.contentattributes.Name AS "contentattribute_name", 
db_name.contentattributes.Value AS "contentattribute_value"
FROM contentversion cv
INNER JOIN (
    SELECT ContentId, MAX(Version) Version
    FROM contentversion
    GROUP BY ContentId
) b ON cv.ContentId = b.ContentId AND cv.Version = b.Version
JOIN db_name.content
ON db_name.content.ContentId = cv.ContentId
JOIN db_name.contentattributes
ON db_name.contentattributes.ContentVersionId = cv.ContentVersionId

Solution

  • The final query that worked for me was (thanks to Thorsten Kettner):

    SELECT cv.ContentId, cv.Version, cv.ContentVersionId,
    db_name.content.alias AS "alias",
    db_name.content.PublishDate AS "publish_date", 
    db_name.contentattributes.Name AS "contentattribute_name", 
    db_name.contentattributes.Value AS "contentattribute_value"
    FROM contentversion cv
    INNER JOIN (
        SELECT ContentId, MAX(Version) Version
        FROM contentversion
        GROUP BY ContentId
    ) b ON cv.ContentId = b.ContentId AND cv.Version = b.Version
    JOIN db_name.content
    ON db_name.content.ContentId = cv.ContentId
    JOIN db_name.contentattributes
    ON db_name.contentattributes.ContentVersionId = cv.ContentVersionId