Search code examples
sqldatasetfrequency

display most frequent string


I have a hospital data set where each patient is admitted with a diagnosis. I want to display each patient once with the diagnosis each patient is admitted with with most times. right now each patient that have multiple diagnoses appear in multiple rows.

the dataset is similar to this

$ Subject_id    diagnosis
 1112          SEIZURE
 1112          GASTROINTESTINAL BLEED
 1112          SEIZURE
 1113          FEVER

how do I collapse and display the multiple diagnosis to one row and show by the most frequently used diagnosis?


Solution

  • You can use window functions and aggregation:

    select sd.*
    from (select subject_id, diagnosis, count(*) as cnt,
                 row_number() over (partition by subject_id order by count(*) desc) as seqnum
          from t
          group by subject_d, diagnosis
         ) sd
    where seqnum = 1;
    

    Note: If there are ties for the most common diagnosis, this arbitrarily returns one row. If you want all rows in the event of ties, then use rank() instead of row_number().