Search code examples
rsortingdataframeranking

R - how to index rank and accordingly display a data frame?


I have a data frame that lists down some names of individuals and their monetary transactions carried out in USD. The table lists down data according to several districts and the valid transactions made by either cash or credit cards, like so:

X    Dist    transact.cash    transact.card
a    1       USD              USD
b    1       USD              USD

Where X is an individual and his/her transactions for a period of time keeping that period fixed and Dist is the district where he/she resides. There are over 4000 observations in total for an approx. 80-100 rows per Dist. So far, the sorting, slicing and everything else have been simple operations with dat.cash and dat.card being subsetted tables according to mode of transaction; but I'm having problems when extracting information in reference to ranking the dataset. For this, I have written a function where I specify a rank and the function should show those rows starting from that rank:

rankdat <- function(transact, numb) {
               # Truncated
                 valid.nums = c('highest', 'lowest', 1:nrow(dat.cash)) # for cash subset
                     if (transact == 'cash' && numb == 'highest') { # This is easy
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T), ]# For sorting only cash data set
                  } else if (transact == 'cash' and numb == 1:nrow(dat.cash)) { 
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T) == numb, ] } # Not getting results here
                 }

The last line is returning NULL instead of a ranked transaction and all its rows. Replacing == with %in% still gives NULL and using rank() doesn't change anything. For highest and lowest numbers, its not a great deal since it only involves simple sorting. If I specify rankdat('cash', 10), the function should return values starting from the 10th highest transaction and decreasing irrespective of Dist, similar to:

 X    Dist    transact.cash
 b    1       10th highest
 h    2       11th highest
 p    1       12th highest
 and  so      on

Solution

  • This function is able to do that:

    rankdat <- function(df,rank.by,num=10,method="top",decreasing=T){
      # ------------------------------------------------------
      # RANKDAT
      # ------------------------------------------------------
      # ARGUMENT 
      # ========
      # df        Input dataFrame [d.f]
      # num       Selected row [num]
      # rank.by   Name of column(s) used to rank dataFrame
      # method    Method used to extract rows
      #             top - to select top rank (e.g. 10 first rows)
      #             specific - to select specific row
      # ------------------------------------------------------
      eval(parse(text=paste("sort=df[with(df,order(",rank.by,"), decreasing=",decreasing,"),]",sep=""))) # order dataFrame by 
      if(method %in% "top"){
        return(sort[1:num,])
      }else if(method %in% "specific"){
        return(sort[num,])
      }else{
        stop("Please select method used to extract data !!!")
      }
    }