Search code examples
rfilteringtidyrstartswith

Filter_at() not working with -starts_with()


I have a dataset with multiple samples (columns) and variables (rows). I want to filter out a dataset to determine variables that are unique to a particular set of samples.

This is the sample data frame

dput(df)
structure(list(Description=c("k__Bacteria;__;__;__;__","k__Bacteria;p__Acidobacteria;c__[Chloracidobacteria];o__RB41;f__Ellin6075", 
"k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae", 
"k__Bacteria;p__Acidobacteria;c__DA052;o__Ellin6513;f__", "k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__", 
"k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinopolysporaceae", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae"
), ADZU.3 = c(2651L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 12L), ADZU.4 = c(2439L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 5L), BEP.3 = c(11452L, 9L, 5L, 
0L, 0L, 6L, 14L, 0L, 0L, 83L), BEP.4 = c(4168L, 0L, 0L, 9L, 3L, 
0L, 0L, 5L, 6L, 61L), Hya.1 = c(15179L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 94L), Hya.2 = c(4525L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
34L)), row.names = c(NA, 10L), class = "data.frame")

I am using the filter_at() function in dyplr, and have a code that works as intended. Below, I have many samples starting with different letters A, B, H, etc. I want to find variables that are unique to samples that start with the same letter (for example, letter B).

I have a code that currently works well

##code set 1, this code works

df.bep<-filter_at(df,vars(starts_with("A"),starts_with("H")), 
all_vars(.==0))

The result of this code is the following, which is what I expect to see:

dput(df.bep)
structure(list(Description = c("k__Bacteria;p__Acidobacteria;c__[Chloracidobacteria];o__RB41;f__Ellin6075", 
"k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae", 
"k__Bacteria;p__Acidobacteria;c__DA052;o__Ellin6513;f__", "k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__", 
"k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae", 
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinopolysporaceae"
), ADZU.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), ADZU.4 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), BEP.3 = c(9L, 5L, 0L, 0L, 6L, 14L, 
0L, 0L), BEP.4 = c(0L, 0L, 9L, 3L, 0L, 0L, 5L, 6L), Hya.1 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), Hya.2 = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L)), row.names = c(NA, -8L), class = "data.frame")

This issue is that for longer datasets with many different samples, specifying every letter for every sample I want to filter_at() starts to get cumbersome to write.

I modified the script to use -starts_with() to try to filter the data frame by excluding samples that start with a specific letter I don't want to filter (for example filter all samples except those that start with letter B), such as:

df.bep.2<-filter_at(df,vars(-starts_with("B")),all_vars(.==0))

However, this second set of code doesn't work as intended. I do not get any errors, but instead I get an empty data frame

dput(df.bep.2)
structure(list(Description = character(0), ADZU.3 = integer(0), 
ADZU.4 = integer(0), BEP.3 = integer(0), BEP.4 = integer(0), 
Hya.1 = integer(0), Hya.2 = integer(0)), row.names = c(NA, 
0L), class = "data.frame")

is there something additional I need to put in the code when combining filter_at() and -starts_with()?


Solution

  • That means your condition in all_vars is not met in columns that do not start with "A". That filter is searching all columns that don't start with A and only selecting rows that contain all 0's.

    For example, mtcars dataset will not return anything with this condition:

    mtcars %>%
      filter_at(vars(-starts_with("q")), all_vars(. == 0))
    
     [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
    <0 rows> (or 0-length row.names)
    

    Unless, we add a row with only 0's (although we could have a non-zero for the qsec column):

    mtcars %>%
      bind_rows(setNames(rep(0, ncol(.)), names(.))) %>%
      filter_at(vars(-starts_with("q")), all_vars(. == 0))
    
      mpg cyl disp hp drat wt qsec vs am gear carb
    1   0   0    0  0    0  0    0  0  0    0    0
    

    EDIT: for your specific problem, it is because the column Description does not == 0. There are probably a couple solutions, but here are two below that should work for you!

    df1 %>%
      filter_at(vars(-starts_with("B"), -one_of("Description")), all_vars(. == 0))
    
    df1 %>%
      filter_if(sapply(., is.numeric) & !startsWith(names(.), "B"), all_vars(. == 0))