Search code examples
rdataframesubsetdata-cleaning

Find the nth largest values in the top row and omit the rest of the columns in R


I am trying to change a data frame such that I only include those columns where the first value of the row is the nth largest.

For example, here let's assume I want to only include the columns where the top value in row 1 is the 2nd largest (top 2 largest).

dat1 = data.frame(a = c(0.1,0.2,0.3,0.4,0.5), b = c(0.6,0.7,0.8,0.9,0.10), c = c(0.12,0.13,0.14,0.15,0.16), d = c(NA, NA, NA, NA, 0.5))

    a   b    c  d
1 0.1 0.6 0.12 NA
2 0.2 0.7 0.13 NA
3 0.3 0.8 0.14 NA
4 0.4 0.9 0.15 NA
5 0.5 0.1 0.16 0.5

such that a and d are removed, because 0.1 and NA are not the 2nd largest values in row 1. Here 0.6 and 0.12 are larger than 0.1 and NA in column a and d respectively.

    b    c 
1 0.6 0.12 
2 0.7 0.13
3 0.8 0.14 
4 0.9 0.15 
5 0.1 0.16

Is there a simple way to subset this? I do not want to order it, because that will create problems with other data frames I have that are related.


Solution

  • Complementing pieca's answer, you can encapsulate that into a function. Also, this way, the returning data.frame won't be sorted.

    get_nth <- function(df, n) {
      df[] <- lapply(df, as.numeric) # edit
      cols <- names(sort(df[1, ], na.last = NA, decreasing = TRUE))
      cols <- cols[seq(n)]
      df <- df[names(df) %in% cols]
      return(df)
    }
    

    Hope this works for you.