Search code examples
rgroupingranking

Is it possible to rank data after grouping it by observations in R Programming language?


I have data on all the US Hospitals regarding certain attributes. A representative sample may look like the one shown below:

  Hospital.Name    State       HA      HF      PN    Rank
1    HOSPITAL 1       NY     10.1     7.1     7.7       1
2    HOSPITAL 2       PA     10.4    11.2    11.6       2
3    HOSPITAL 3       SD     10.5     9.9    11.6       3
4    HOSPITAL 4       CA     10.5     8.4     9.7       4
5    HOSPITAL 5       CT     10.6    11.9    11.8       5
6    HOSPITAL 6       PA     10.7     8.3    10.9       6

The data shown above is already part processed - arranged and ranked by the attribute "HA" and a column ("Rank") has been added to reflect that.

However, this is not what I need. I am trying to rank this data, grouped by state, for these attributes.

So, the final output would be somewhat like below:

  Hospital.Name    State       HA      HF      PN    Rank
1    HOSPITAL 1       AK     12.1     7.1     7.7       1
2    HOSPITAL 2       AK     12.4    11.2    11.6       2
3    HOSPITAL 3       AK     14.5     9.9    11.6       3
4    HOSPITAL 4       AL     11.5     8.4     9.7       1
5    HOSPITAL 5       AL     15.6    11.9    11.8       2
6    HOSPITAL 6       AL     16.7     8.3    10.9       3

Please note that value of attribute "HA" for rank 1 of a particular state could be higher or lower than value of attribute HA for rank 1 of another state. Essentially, the ranks for the attribute "HA", within states, are independent.

So, far, I have tried using order and rank functions, nestled, within mutate function. However, both throw up the following error:

Error: incompatible size (%d), expecting %d (the group size) or 1

How can I achieve what I need? Any help / guidance / advice would be highly appreciated.

**Please note that I have data for all the 50 states running into thousands of rows and what is shown in table 1 above is only representative data.

Also, please note that the data in Table 2 above is a sample of the desired output and not the whole output. **


Solution

  • Here is another solution using dplyr. You can group by the State and then do the ranking:

    dt <- data.table(Hospital.Name = "Hospital", 
               State = rep(c("AK", "AL"), each = 2), 
               HA = c(10,12,11,14)) 
    
    dt %>% 
      group_by(State) %>%
      mutate(Rank = rank(HA))
    

    output:

      Hospital.Name State HA Rank
    1      Hospital    AK 10    1
    2      Hospital    AK 12    2
    3      Hospital    AL 11    1
    4      Hospital    AL 14    2
    

    One more word on the confusion that came up:

    rank(dt$HA) gives a vector (here size 4) whereas rank(HA) gives a scalar which represents the current rank of the current row within the column HA (also on the group basis if specified).