Search code examples
rdataframedplyrsubsetdata-cleaning

Create subset of the sample by different variables simultaneously


I have a data frame as the following. Variables a and b are continuous, and variables v1-v7 are binary.

> df <- data.frame(a= c(1,1,2,3,5),
+                      b  = c(3, 6,8, 2, 4),
+                      v1 = c(0,0,0,0,0),
+                      v2 = c(1,0,0,0,0),
+                      v3 = c(0,1,1,1,1),
+                      v4 = c(0,1,1,1,1),
+                      v5 = c(0,0,0,0,1),
+                      v6 = c(0,0,0,0,0),
+                      v7 = c(0,0,0,0,0))
> df
  a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0
> 

I want to create seven subsamples based on the data frame I showed above. Specifically, I want to make seven subsamples that only include variables a and b and when each v1-v7 equals 1. For example,

> df1 <- df %>% filter(v1==1)
> df1
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
> df2 <- df %>% filter(v2==1)
> df2
  a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
> df3 <- df %>% filter(v3==1)
> df3
  a b v1 v2 v3 v4 v5 v6 v7
1 1 6  0  0  1  1  0  0  0
2 2 8  0  0  1  1  0  0  0
3 3 2  0  0  1  1  0  0  0
4 5 4  0  0  1  1  1  0  0

I want to know how can I do these simultaneously in R? Thanks.


Solution

  • in dplyr you can specify a variable name as character string with the pronoun .data (see data masking)

    df_samples <- list()
    for(i in 1:7)
      df_samples[[i]] <- filter(df, .data[[paste0("v", i)]] == 1)