I have a table that looks like below:
person | fruit | date |
---|---|---|
A | apple | xxxx |
B | banana | xxxx |
C | apple | xxxx |
A | banana | xxxx |
C | apple | xxxx |
B | banana | xxxx |
I am interested in persons who have more than one banana in the data set. In this case, it would be person B. I understand how to achieve this by aggregating the data. However, if I want my result to be NOT agrregated and look something like below, what would be the best way?
person | fruit | date |
---|---|---|
B | banana | xxxx |
B | banana | xxxx |
You can achieve this by using window function in subquery (in this case with count_if
aggregate function) and filtering the result:
-- sample data
WITH dataset(person, fruit, date) AS (
VALUES ('A', 'apple' ,'xxxx'),
('B', 'banana' ,'xxxx'),
('C', 'apple' ,'xxxx'),
('A', 'banana' ,'xxxx'),
('C', 'apple' ,'xxxx'),
('B', 'banana' ,'xxxx')
)
-- query
select person, fruit, date
from (select *,
count_if(fruit = 'banana') over(partition by person) banana_count
from dataset)
where banana_count > 1
Output:
person | fruit | date |
---|---|---|
B | banana | xxxx |
B | banana | xxxx |