I have a hospital data set where each patient is admitted with a diagnosis. I want to display each patient once with the diagnosis each patient is admitted with with most times. right now each patient that have multiple diagnoses appear in multiple rows.
the dataset is similar to this
$ Subject_id diagnosis
1112 SEIZURE
1112 GASTROINTESTINAL BLEED
1112 SEIZURE
1113 FEVER
how do I collapse and display the multiple diagnosis to one row and show by the most frequently used diagnosis?
You can use window functions and aggregation:
select sd.*
from (select subject_id, diagnosis, count(*) as cnt,
row_number() over (partition by subject_id order by count(*) desc) as seqnum
from t
group by subject_d, diagnosis
) sd
where seqnum = 1;
Note: If there are ties for the most common diagnosis, this arbitrarily returns one row. If you want all rows in the event of ties, then use rank()
instead of row_number()
.