Search code examples
rcbind

Sort a subset of columns for each row


I want to sort my.data[4:10] in descending order by row. Some clues here, but I could not parse it sufficiently: Sort second to fifth column for each row in R.

I also tried things like:

sort(my.data, decreasing = TRUE, partial = c([4:10]))

which didn't work, but I think the former is more in line with what I need. I read through ?cbind, ?apply, and ?sort help, but the examples are just to cryptic for me.

Here's my sample dataset:

habitat<-c('Marsh','Prairie','Savanna','Swamp','Woodland')
NumSites<-c(3,3,4,1,4)
NumSamples<-c(6,5,8,2,8)
Sp1<-c(NA,2,NA,2,1)
Sp2<-c(NA,2,1,NA,1)
Sp3<-c(NA,NA,NA,NA,1)
Sp4<-c(3,NA,NA,NA,NA)
Sp5<-c(NA,NA,3,NA,NA)
Sp6<-c(1,NA,67,NA,2)
Sp7<-c(NA,2,3,NA,1)

my.data<-data.frame(habitat,NumSites,NumSamples,Sp1,Sp2,Sp3,Sp4,Sp5,Sp6,Sp7)

# I suspect a varient of this must work:
# cbind(df[,1], t(apply(df[,-1], 1, sort)))

desired result should look like:

habitat  NumSites NumSamples Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7
Marsh    3        6          3   1   NA  NA  NA  NA  NA
Prairie  3        5          2   2   2   NA  NA  NA  NA
Savanna  4        8          67  3   3   1   NA  NA  NA
Swamp    1        2          2   NA  NA  NA  NA  NA  NA
Woodland 4        8          2   1   1   1   1   NA  NA

I feel like the cbind approach is close...

Also, actual data has many and varied number of columns and column names, so I want to use range [4:10] instead of names of columns.


Solution

  • This answer's approach, which you quote above, is close:

    cbind(df[,1], t(apply(df[,-1], 1, sort)))
    

    but it needed two changes:

    • You want to sort all but the first three columns, not all but the first. So change [,1] and [,-1] to [, 1:3] and [, -(1:3)], respectively.
    • By default, sort sorts in increasing order while you want decreasing order, and drops the NAs out entirely, while you want them last. You can fix this by adding the decreasing=TRUE, na.last=TRUE arguments to sort.

    This makes the solution:

    cbind(my.data[, 1:3], t(apply(my.data[, -(1:3)], 1, function(v) sort(v, decreasing=TRUE, na.last=TRUE))))
    

    Note that it might be a bit clearer if you split it onto multiple lines:

    mysort = function(v) sort(v, decreasing=TRUE, na.last=TRUE)
    sorted.cols = t(apply(my.data[, -(1:3)], 1, mysort))
    cbind(my.data[, 1:3], sorted.cols)