Search code examples
rfor-loopsubset

Get variable value in subset function - R


I found an issue while trying to get the value of a variable in the subset function. When I run the code, I receive the message: "Warning: Error in -: invalid argument to unary operator" because "val" in subset function "-c(val)" not define as variable above.

cname <- c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10",
           "A11","A12","A13","A14","A15","A16","A17","A18","A19","A20",
           "A21","A22","A23","A24","A25","A26","A27","A28","A29","A30","A31")
    
for (i in 15:length(cname)) {
    val <- cname[i]
    ifelse(sum(!is.na(df2$val))==0, 
           df2 <- subset(df2, select = -c(val)), 
           df2)
}

The df2 results in this data.

My expected result is to remove unnecessary columns that have NA values only, as you can see here.

How can I get the value from val, so I can remove the columns that have only NA values?


Solution

  • We can use subset without a loop - use the vectorized colSums on a logical matrix (is.na(df2)) to return the count of NAs in each column, compare (!=) it with the number of rows (nrow(df2)) to create a logical vector, subset the column names, use that in select argument in subset

    subset(df2, select = names(df2)[colSums(is.na(df2)) != nrow(df2)])
    

    -output

     A1 A2 A4 A5
    1  1  1 NA 10
    2  2  2 NA 10
    3  3  3 NA 10
    4  4 NA  3 10
    5  5  5  2 10
    

    Or with tidyverse - use select and check for any non-NA elements in each column for selecting the column

    library(dplyr)
    df2 %>%
       select(where(~ any(!is.na(.x))))
    

    -output

      A1 A2 A4 A5
    1  1  1 NA 10
    2  2  2 NA 10
    3  3  3 NA 10
    4  4 NA  3 10
    5  5  5  2 10
    

    data

    df2 <- data.frame(A1 = 1:5, A2 = c(1:3, NA, 5), A3 = NA_integer_,
         A4 = c(NA, NA, NA, 3, 2), A5 = 10)