This is a problem I always struggle with.
Suppose I have the following table (table_a):
CREATE TABLE table_a (
name INT,
height INT,
year INT
);
INSERT INTO table_a (name, height, year)
VALUES
(1, 180, 2010),
(1, 180, 2010),
(1, 180, 2010),
(2, 175, 2020),
(2, 175, 2020),
(3, 150, 2019);
name height year
1 180 2010
1 180 2010
1 180 2010
2 175 2020
2 175 2020
3 150 2019
I only want each name to appear once in the final result - I don't care which one:
name height year
1 180 2010
2 175 2020
3 150 2019
Normally, I would have used the rank/partition statements and order by some non-unique column. However in this case, there is no non-unique column. Thus, ranking will give each name the same rank, so I can't do a rank/partition and select where rank=1.
I thought of this indirect way to solve the problem:
select * from (
select *,
row_number() over (partition by name order by (select null)) as rn
from table_a)g
where rn=1);
Is this the standard way to solve this kind of problem?
Thanks!
A regular distinct select should work here:
SELECT DISTINCT name, height, year
FROM table_a
ORDER BY name;
But this assumes that all 3 columns would always have the same value per name
group. If this is not the case, then you could use ROW_NUMBER()
with a random ordering:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY RAND()) rn
FROM table_a t
)
SELECT name, height, year
FROM cte
WHERE rn = 1;