Search code examples
sqlsubqueryaverageaggregate-functionsdistinct

DISTINCT not giving expected result


ID  FirstName   LastName    Gender  Salary
1   Ben         Hoskins     Male    70000
2   Mark        Hastings    Male    60000
4   Ben         Hoskins     Male    70000
8   John        Stanmore    Male    80000

While running the query:

select *
from Employees
where  Salary > (SELECT AVG(distinct SALARY) FROM employees)

It is displaying 2 records for 7000 it should display one. Why is it showing 2 records if I have used distinct?


Solution

  • You have used distinct the subquery, not in the outer query, so the outer query still sees the data with the duplicates.

    You could solve this using:

    select e.FirstName, e.LastName, e.Gender, e.Salary
    from Employees e
    where e.Salary > (SELECT AVG(distinct e2.SALARY) FROM employees e2);
    

    However, something seems seriously wrong with the data model if you have rows with such duplicates. The data should be fixed.

    In the meantime, you can work around the problem. You can phrase your query using a CTE that eliminates the duplicates:

    with e as (
          select e.*
          from (select e.*,
                       row_number(). over (partition by firstname, lastname, gender order by id desc) as seqnum
                from employees e
               )
          where seqnum = 1
         )
    select e.*
    from e
    where e.salary > (select avg(salary) from e)