Search code examples
rrankingpermutationdataframe

How to select the best rank according to quantitative values of another variable


I created a dataframe that looks like this:

# Dataframe
GeneID              TrID                PSI  Length Ranking  
ENSMUSG00000089809  ENSMUST00000146396  0.20 431801  3     
ENSMUSG00000089809  ENSMUST00000161516  0.23 354036  2  
ENSMUSG00000089809  ENSMUST00000161148  0.57   5601  1  
ENSMUSG00000044681  ENSMUST00000117098  0.05   4400  2  
ENSMUSG00000044681  ENSMUST00000141196  0.10   1118  1  
ENSMUSG00000044681  ENSMUST00000141601  0.75  44973  5  

Now I would like to select for each GeneId the TrID that has the higher PSI value with the respective Ranking. At the end the output will be like this:

# Desired Output Dataframe
GeneID             TrID               PSI Length Ranking     
ENSMUSG00000089809 ENSMUST00000161148 0.57  5601 1      
ENSMUSG00000044681 ENSMUST00000141601 0.75 44973 5      

After that, I will create a distribution of the ranking values and check in which PSI value the rank corresponds. I will permute the Length values and the TrID values in order to perform a control of the distribution.


Solution

  • You can use base R and do:

    byGeneId = split(1:nrow(Dataframe), Dataframe$GeneId)
    whichTopPsi = sapply(byGeneId, function(i) i[which.max(Dataframe[i,'PSI'])])
    Dataframe[whichTopPsi,]
    

    You could also use ddply, which is more general.

    require(plyr)
    ddply(Dataframe, "GeneId", function(d) d[which.max(d[,'PSI']),,drop=FALSE])