Search code examples
sqlpostgresqlduplicatesaggregate-functionswindow-functions

Find rows with duplicate values in a column


I have a table author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Now I need the result as:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

That is, I need the author_id for the names having duplicate appearances. I have tried this statement:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

But it's not working. How can I get this?


Solution

  • I suggest a window function in a subquery:

    SELECT author_id, author_name  -- omit the name here if you just need ids
    FROM (
       SELECT author_id, author_name
            , count(*) OVER (PARTITION BY author_name) AS ct
       FROM   author_data
       ) sub
    WHERE  ct > 1;
    

    You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

    This way it counts rows per partition. Voilá.

    It has to be done in a subquery because the result cannot be referenced in the WHERE clause in the same SELECT (happens after WHERE). See:

    In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

    SELECT author_id, author_name  -- omit name, if you just need ids
    FROM   author_data a
    WHERE  EXISTS (
       SELECT FROM author_data a2
       WHERE  a2.author_name = a.author_name
       AND    a2.author_id <> a.author_id
       );
    

    If you are concerned with performance, add an index on author_name.