sql postgresql duplicates aggregate-functions window-functions

Find rows with duplicate values in a column

I have a table author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Now I need the result as:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

That is, I need the author_id for the names having duplicate appearances. I have tried this statement:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

But it's not working. How can I get this?

Solution

I suggest a window function in a subquery:

SELECT author_id, author_name  -- omit the name here if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

This way it counts rows per partition. Voilá.

It has to be done in a subquery because the result cannot be referenced in the WHERE clause in the same SELECT (happens after WHERE). See:

Best way to get result count before LIMIT was applied

In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT FROM author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

If you are concerned with performance, add an index on author_name.