I have a dataframe in R dat
that contains the columns Gene
, Expression
, and SampleID
. What I've done is sort dat
such that it's grouped by Gene
and arranged by descending Expression
for each gene, using the following:
dat_sorted <- dat %>% select(Gene, Expression, SampleID) %>%
group_by(Gene) %>%
arrange(Gene, desc(Expression))
What I now wish to do is add a Rank
column to dat_sorted
which would apply a rank within each Gene
group based on the Expression
value, such that, for a given gene, a given sample will have a higher rank if it's expression is higher.
Here's an example of what the outcome should look like:
Gene Expression Sample Rank
ENSG00000000003 2.81561500 HSB671 1
ENSG00000000003 2.79336700 HSB431 2
ENSG00000000003 2.40009100 HSB618 3
ENSG00000000938 1.75148448 HSB671 1
ENSG00000000938 1.52182467 HSB670 2
ENSG00000000938 0.83478860 HSB414 3
ENSG00000000938 0.62174432 HSB459 4
Looks like people forgot about your question. Hope this doesn't come too late ^^
library(dplyr)
df %>% group_by(Gene) %>% mutate(Rank = dense_rank(desc(Expression)))
> df
# A tibble: 7 x 4
# Groups: Gene [2]
Gene Expression Sample Rank
<chr> <dbl> <chr> <dbl>
1 ENSG00000000003 2.82 HSB671 1
2 ENSG00000000003 2.79 HSB431 2
3 ENSG00000000003 2.40 HSB618 3
4 ENSG00000000938 1.75 HSB671 1
5 ENSG00000000938 1.52 HSB670 2
6 ENSG00000000938 0.835 HSB414 3
7 ENSG00000000938 0.622 HSB459 4
Or with base R:
df$Rank <- ave(-df$Expression, df$Gene, FUN = rank)