Search code examples
sql-server-2012distinct-values

Duplicates using distinct operator


I have below tables.

create table html_details(A int,b int,c int,d nvarchar(4))

create table pdf_details(A int,b int,c int,d nvarchar(4))

insert into pdf_details values(1,2,3,'pdf')
insert into pdf_details values(1,2,3,'pdf')
insert into pdf_details values(4,5,6,'pdf')

insert into html_details values(1,2,3,'html')
insert into html_details values(1,2,3,'html')
insert into html_details values(4,5,6,'html')

now i am using below query to avoid duplication in each tables.

select distinct a,b,c,d from html_details
union all
select distinct a,b,c,d from pdf_details

but above query gives poor performance because of distinct function in both query.so i am using distinct in outer query.Now performance is improved, but will it give same output?Both query are same in logic?

select distinct a,b,c,d from (
select  a,b,c,d from html_details
union all
select a,b,c,d from pdf_details
)a

Solution

  • No. It will not return the same output.

    Distinct in individual queries will get you unique records from both the queries and then it will be union-ed. So if there are similar rows in both the queries results, both of them will be present in final result.

    Lets say your data is:

    Table 1:

    1,2,3,pdf 1,2,3,pdf 1,2,3,hello

    Table 2:

    1,2,3,html 1,2,3,html 1,2,3,hello

    First approach's result will be (There's no distinct in final response) -

    1,2,3,pdf 1,2,3,hello 1,2,3,html 1,2,3,hello

    Second approach's result will be (There's a distinct in final response) -

    1,2,3,pdf 1,2,3,html 1,2,3,hello

    I hope this explains.