Search code examples
rfor-loop

Ordering rows in each dataframe into a list based on especific column in R


My list of data frames looks like this:

DFlist <- list(df1 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50,5), V3 = sample(1:50,5), cluster = rep("V2", 5)), 
           df2 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50,5), V3 = sample(1:50,5), cluster = rep("V1", 5)),
           df3 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50,5), V3 = sample(1:50,5), cluster = rep("V3", 5)))

I want to order the rows of each data frame into the list DFlist by the column whose column name matches the the first cell in column 5 of each DF. The code I am trying is below, but it is not working.

for (i in 1:length(DFlist)) {
DFlist[[i]] <- DFlist[[i]][with(DFlist[[i]], order(paste0(DFlist[[i]][1,5]), decreasing = T)),]
}

This following code works but it is not practical since the number of data frames in my real data is big

DFlist[["df1"]] <- DFlist[["df1"]][with(DFlist[["df1"]], order(V2, decreasing = T)),]
DFlist[["df2"]] <- DFlist[["df2"]][with(DFlist[["df2"]], order(V1, decreasing = T)),]
DFlist[["df3"]] <- DFlist[["df3"]][with(DFlist[["df3"]], order(V3, decreasing = T)),]

Solution

  • You can try sort_by

    > lapply(DFlist, \(x) with(x, sort_by(x, x[cluster[[1]]], decreasing = TRUE)))
    $df1
      ID V1 V2 V3 cluster
    3  D 43 49  9      V2
    5  A 18 46 21      V2
    4  G 14 42 15      V2
    1  N 34 33 10      V2
    2  Y 23 21  7      V2
    
    $df2
      ID V1 V2 V3 cluster
    4  Z 44  6 20      V1
    2  I 42 20 38      V1
    1  E 34 33 42      V1
    3  N 25 35 47      V1
    5  W 15 10 28      V1
    
    $df3
      ID V1 V2 V3 cluster
    1  T 44  6 45      V3
    5  H 42  2 38      V3
    3  W  6 32 22      V3
    2  L 25 24 18      V3
    4  F 39 14 14      V3
    

    Data

    set.seed(0)
    DFlist <- list(
        df1 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50, 5), V3 = sample(1:50, 5), cluster = rep("V2", 5)),
        df2 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50, 5), V3 = sample(1:50, 5), cluster = rep("V1", 5)),
        df3 = data.frame(ID = sample(LETTERS, 5), V1 = sample(1:50, 5), V2 = sample(1:50, 5), V3 = sample(1:50, 5), cluster = rep("V3", 5))
    )