Search code examples
rrangelapplynegative-number

Select/Get names of all columns which have a negative value between 0-10


For a dataframe, I would like to get the names of, or select, all the columns which have negative values within a certain range. This post comes pretty close, but it iterates trough the rows which is not feasible for my data. In addition, if I store that solution it becomes a list, where I would prefer a vector. As an example, for the following dataset:

library(data.table)
df <- fread(
     "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012",
  header = TRUE
)

I would want to have the names of all columns which have negative values between 0 and 10 (A and D in the example). What would be the easiest solution to achieve this? Everything else equal a data.table solution would be preferred.


Solution

  • One tidyverse possibility could be:

     df %>%
     gather(var, val, -c(5:6)) %>%
     group_by(var) %>%
     summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))
    
      var   res  
      <chr> <lgl>
    1 A     TRUE 
    2 B     FALSE
    3 D     TRUE 
    4 E     FALSE
    

    To select only numeric columns:

    df %>%
     select_if(is.numeric) %>%
     gather(var, val) %>%
     group_by(var) %>%
     summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0)) 
    

    Note that it also selects "year" column as it is a numeric column.

    You can do this also with base R:

    df <- Filter(is.numeric, df)
    cond <- as.logical(colSums(df > -10, na.rm = TRUE) *
                        colSums(df < -0, na.rm = TRUE))
    colnames(df[, cond])
    
    [1] "A" "D"
    

    Or written as a "one-liner":

    df <- Filter(is.numeric, df)
    colnames(df[, as.logical(colSums(df > -10, na.rm = TRUE) * colSums(df < -0, na.rm = TRUE))])
    

    Sample data:

    df <- read.table(text = 
     "A   B   D   E  iso   year   
          0   1   1   NA ECU   2009   
          1   0   2   0  ECU   2009   
          0   0   -3  0  BRA   2011   
          1   0   4   0  BRA   2011   
          0   1   7   NA ECU   2008   
         -1   0   1   0  ECU   2008   
          0   0   3   2  BRA   2012   
          1   0   4   NA BRA   2012", 
     header = TRUE,
     stringsAsFactors = FALSE)