Search code examples
rrowranking

R create an ordered quintile for rows


I have a data frame that I would like to categorise the columns, across rows, as belonging to the first, second, third, fourth or fifth qunitile (a bit confusing I know but the example should clarify). This I have done but the problem is that, firstly not all factor levels are present in each variable and secondly, the factors are not ordered in the most logical way. Here are some test data.

x.df<-structure(list(Location = structure(1:6, .Label = c("Site A", 
"Site B", "Site C", "Site D", "Site E", "Site F"), class = "factor"), 
Var1 = c(78L, 5L, 85L, 87L, 89L, 82L), Var2 = c(98L, 5L, 
67L, 92L, 3L, 44L), Var3 = c(30L, 54L, 56L, 3L, 31L, 58L), 
Var4 = c(63L, 96L, 14L, 95L, 90L, 99L), Var5 = c(71L, 52L, 
78L, 93L, 74L, 26L), Var6 = c(21L, 66L, 57L, 42L, 39L, 69L
), Var7 = c(97L, 42L, 84L, 46L, 86L, 46L), Var8 = c(100L, 
99L, 6L, 41L, 94L, 20L), Var9 = c(84L, 82L, 26L, 91L, 38L, 
80L), Var10 = c(8L, 50L, 23L, 92L, 46L, 1L)), .Names = c("Location",
"Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7", "Var8", 
"Var9", "Var10"), class = "data.frame", row.names = c(NA, -6L
))

cut_fn<-function(x){cut(x,quantile(x, c(0.0,0.2,0.4,0.6,0.8,1.0)),include.lowest=TRUE, c("lowest","low","middle","high","highest"))}
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)))

Thus each row has two each of "highest","high", ..., "lowest".

r.df
       X1      X2     X3      X4      X5     X6      X7      X8     X9    X10
1  middle highest    low     low  middle lowest    high highest   high lowest
2  lowest  lowest middle highest  middle   high     low highest   high    low
3 highest    high middle  lowest    high middle highest  lowest    low    low
4  middle    high lowest highest highest    low     low  lowest middle   high
5    high  lowest lowest highest  middle    low    high highest    low middle
6 highest     low middle highest     low   high  middle  lowest   high lowest
str(r.df)
'data.frame':   6 obs. of  10 variables:
 $ X1 : Factor w/ 4 levels "high","highest",..: 4 3 2 4 1 2
 $ X2 : Factor w/ 4 levels "high","highest",..: 2 4 1 1 4 3
 $ X3 : Factor w/ 3 levels "low","lowest",..: 1 3 3 2 2 3
 $ X4 : Factor w/ 3 levels "highest","low",..: 2 1 3 1 1 1
 $ X5 : Factor w/ 4 levels "high","highest",..: 4 4 1 2 4 3
 $ X6 : Factor w/ 4 levels "high","low","lowest",..: 3 1 4 2 2 1
 $ X7 : Factor w/ 4 levels "high","highest",..: 1 3 2 3 1 4
 $ X8 : Factor w/ 2 levels "highest","lowest": 1 1 2 2 1 2
 $ X9 : Factor w/ 3 levels "high","low","middle": 1 1 2 3 2 1
 $ X10: Factor w/ 4 levels "high","low","lowest",..: 3 2 2 1 4 3

Ideally what I'd like is all variables having the (ordered) structure:

 $ X1 : Factor w/ 5 levels "highest","high",..: 

Solution

  • If I understand your question correctly, you would like to have each column ordered. This easiest way is to loop through all of the column converting them with the factor function with the ordered=TRUE option.
    Try this:

    #first create r.df with stringsAsFactors as false
    r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)), stringsAsFactors = FALSE)
    
    #now loop across all of the columns creating an order factor list
    #lowest=1 while highest =5
    for(x in names(r.df)) {
      r.df[[x]]<-factor(r.df[[x]], levels=c("lowest","low","middle","high","highest"), ordered=TRUE)}
    }
    

    Now each column will have all five levels arranged in the proper order.