Search code examples
rdataframeranking

Get ranking within each group for r dataframe


I have a dataframe in R dat that contains the columns Gene, Expression, and SampleID. What I've done is sort dat such that it's grouped by Gene and arranged by descending Expression for each gene, using the following:

dat_sorted <- dat %>% select(Gene, Expression, SampleID) %>%
    group_by(Gene) %>% 
    arrange(Gene, desc(Expression))

What I now wish to do is add a Rank column to dat_sorted which would apply a rank within each Gene group based on the Expression value, such that, for a given gene, a given sample will have a higher rank if it's expression is higher.

Here's an example of what the outcome should look like:

Gene                Expression      Sample      Rank
ENSG00000000003     2.81561500      HSB671      1
ENSG00000000003     2.79336700      HSB431      2
ENSG00000000003     2.40009100      HSB618      3
ENSG00000000938     1.75148448      HSB671      1
ENSG00000000938     1.52182467      HSB670      2
ENSG00000000938     0.83478860      HSB414      3
ENSG00000000938     0.62174432      HSB459      4

Solution

  • Looks like people forgot about your question. Hope this doesn't come too late ^^

    library(dplyr)
    
    df %>% group_by(Gene) %>% mutate(Rank = dense_rank(desc(Expression)))
    
    > df
    # A tibble: 7 x 4
    # Groups:   Gene [2]
      Gene            Expression Sample  Rank
      <chr>                <dbl> <chr>  <dbl>
    1 ENSG00000000003      2.82  HSB671     1
    2 ENSG00000000003      2.79  HSB431     2
    3 ENSG00000000003      2.40  HSB618     3
    4 ENSG00000000938      1.75  HSB671     1
    5 ENSG00000000938      1.52  HSB670     2
    6 ENSG00000000938      0.835 HSB414     3
    7 ENSG00000000938      0.622 HSB459     4
    
    

    Or with base R:

    df$Rank <- ave(-df$Expression, df$Gene, FUN = rank)