For a dataframe, I would like to get the names of, or select, all the columns which have negative values within a certain range. This post comes pretty close, but it iterates trough the rows which is not feasible for my data. In addition, if I store that solution it becomes a list, where I would prefer a vector. As an example, for the following dataset:
library(data.table)
df <- fread(
"A B D E iso year
0 1 1 NA ECU 2009
1 0 2 0 ECU 2009
0 0 -3 0 BRA 2011
1 0 4 0 BRA 2011
0 1 7 NA ECU 2008
-1 0 1 0 ECU 2008
0 0 3 2 BRA 2012
1 0 4 NA BRA 2012",
header = TRUE
)
I would want to have the names of all columns which have negative values between 0 and 10 (A and D in the example). What would be the easiest solution to achieve this? Everything else equal a data.table solution would be preferred.
One tidyverse
possibility could be:
df %>%
gather(var, val, -c(5:6)) %>%
group_by(var) %>%
summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))
var res
<chr> <lgl>
1 A TRUE
2 B FALSE
3 D TRUE
4 E FALSE
To select only numeric columns:
df %>%
select_if(is.numeric) %>%
gather(var, val) %>%
group_by(var) %>%
summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))
Note that it also selects "year" column as it is a numeric column.
You can do this also with base R
:
df <- Filter(is.numeric, df)
cond <- as.logical(colSums(df > -10, na.rm = TRUE) *
colSums(df < -0, na.rm = TRUE))
colnames(df[, cond])
[1] "A" "D"
Or written as a "one-liner":
df <- Filter(is.numeric, df)
colnames(df[, as.logical(colSums(df > -10, na.rm = TRUE) * colSums(df < -0, na.rm = TRUE))])
Sample data:
df <- read.table(text =
"A B D E iso year
0 1 1 NA ECU 2009
1 0 2 0 ECU 2009
0 0 -3 0 BRA 2011
1 0 4 0 BRA 2011
0 1 7 NA ECU 2008
-1 0 1 0 ECU 2008
0 0 3 2 BRA 2012
1 0 4 NA BRA 2012",
header = TRUE,
stringsAsFactors = FALSE)