I am trying to left join two tables and retrieve all columns from table one but remove duplicates based on a set of columns.
SELECT A.*, B.impact
FROM #Site_one AS A WITH (NOLOCK)
LEFT JOIN #Progress AS B With (NOLOCK)
ON lower(A.site_code) = lower(B.site_code)
GROUP BY A.date, A.operationid, A.worklocation, A.siteid, A.alias
This does not work as there will be column in A which either need to be aggregated or be added to the group by
clause. The issue with that is that I do not want to filter on those columns and do not want them aggregated.
Is there a way to select all columns in A and the impact column in B and still be able to filter out duplicates on the columns specified in the group by clause?
Any pointers/help would be greatly appreciated.
and still be able to filter out duplicates on the columns specified in the group by clause
But, how does the database really know which rows to throw away? Suppose you have:
Person
John, 42, Stockbroker
John, 36, Train driver
John, 58, Retired
John, 58, Metalworker
And you think "I wanna dedupe those based on the name":
SELECT * FROM person GROUP BY name
So which three Johns should the DB throw away?
It cannot decide this for you; you have to write the query to make it clear what you want to keep or throw
You could MAX everything:
SELECT name, MAX(age), MAX(job) FROM person GROUP BY name
That'll work.. but it gives you a John that never existed in the original data:
John, 58, Train driver
You could say "I'll only keep the person with the max age":
SELECT p.*
FROM
person p
INNER JOIN (SELECT name, max(age) as maxage FROM person GROUP BY name) maxp
ON p.name = maxp.name AND p.age = maxp.maxage
.. but there are two people with the same max age.
Your DB might have a row number analytic, which is nice:
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
One of your 58 year old Johns will get row number 1 - can't be sure which one, but you could then discard all the rows with an rn > 1:
WITH x as (
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
)
SELECT name, age, job
INTO newtable
FROM x
WHER rn = 1
..but what if you discarded the wrong John...
You're going to have to go and think about this some more, and exactly specify what to throw away...