I have a dataframe that I want to subset inside a function so that only rows where both columns are either 1 or NA remain. For df:
df <- data.frame(a = c(1,1,0,NA,0,1),
b = c(0,1,0,1,0, NA),
c = c(0,0,0,0,0,0))
I want:
a b c
2 1 1 0
4 NA 1 0
6 1 NA 0
The problem I'm having is I have many columns with names that change. So this works well:
subset(df, (is.na(a) | a == 1) & (is.na(b) | b == 1))
but when column names 'a' and 'b' become 'd' and 'f' during the operation of the function it breaks. Specifying by column index works more robustly:
subset(df, (is.na(df[,1]) | df[,1] == 1) & (is.na(df[,2]) | df[,2] == 1))
But is cumbersome, and if a previous processing step messes up and column 'c' ends up before 'a' or 'b' I end up subsetting by the wrong columns.
I also have another dataframe that specifies what the column names to subset by will be:
cro_df <- data.frame(pop = c('c1', 'c2'),
p1 = c('a', 'd'),
p2 = c('b', 'f'))
pop p1 p2
1 c1 a d
2 c2 b f
I would like to be able to extract the column names from that dataframe to use in my subset function, e.g.:
col1 <- cro_df[cro_df[,'pop']=='c1', 'p1']
subset(df, is.na(col1) | col1 == 1)
This returns an empty dataframe. I have tried turning col1 into a symbol and a factor with no success:
subset(df, as.symbol(col1) == 1)
subset(df, sym(col1) == 1)
subset(df, as.factor(col1) == 1)
And they all return:
[1] a b c
<0 rows> (or 0-length row.names)
Is there a way I can specify my columns to subset using the second dataframe cro_df?
You can use filter
and if_all
from the dplyr
package.
Select in the manner you find best suited for your case the names of the columns you want to filter. In my case I just created a variable cols
that contains 'a'
and 'b'
.
Then I check all_of
the column names in cols
and filter
the rows if_all
statements are TRUE
:
library(dplyr) # packageVersion("dplyr") >= 1.1.0
cols <- c('a', 'b')
filter(df, if_all(all_of(cols), \(x) is.na(x) | x == 1))
#> a b c
#> 1 1 1 0
#> 2 NA 1 0
#> 3 1 NA 0
If you assign different column names to cols
you can reuse the same code.