Search code examples
rrankingcategorical-data

Ranking System for a Categorical Matrix in R


I have a data frame where the columns list a child's 5 favorite animals. I want to calculate the frequency of each animal and rank the mean position of the animal across the entire data set.

The data in each column is sorted in ascending order (see data below)(e.g. Var 1: Dog has a rank of 1 and Roach has a rank of 5). The mean ranking for Dog would be (1+2+3)/3 = 2 and the frequency would be 3 across the entire data set. For Cat, the mean ranking is (2+1+2)/3 = 1.67 and the frequency is 3. For Roach, the mean ranking is 4 and its frequency is 3.

 Var1 <- c('Dog','Cat','Chicken','Bird','Roach')
 Var2 <- c('Cat','Dog','Roach','Turtle','Bird')
 Var3 <- c('Bird','Cat','Dog','Roach','Zebra')

 animal.data <- data.frame(Var1, Var2, Var3)
 print(animal.data)

I am not sure what is the most efficient way to implement this. My full data set has over 500 columns. Thank you.


Solution

  • A base R approach,

    animals <- unique(unlist(animal.data))
    out <- list()   
    for(i in animals) {     
        x <- which(animal.data == i, arr.ind = TRUE)
        avg <- round(mean(x[,1]),2)
        freq <- nrow(x)
        out[[i]] <- data.frame(Mean=avg,Freq=freq)
    
    } 
    
    
    do.call(rbind,out)
    

    gives,

            Mean Freq
    Dog     2.00    3
    Cat     1.67    3
    Chicken 3.00    1
    Bird    3.33    3
    Roach   4.00    3
    Turtle  4.00    1
    Zebra   5.00    1
    

    Data:

     Var1 <- c('Dog','Cat','Chicken','Bird','Roach')
     Var2 <- c('Cat','Dog','Roach','Turtle','Bird')
     Var3 <- c('Bird','Cat','Dog','Roach','Zebra')
    
     animal.data <- data.frame(Var1, Var2, Var3,stringsAsFactors=FALSE)