Search code examples
rdataframedplyrtidyverse

Filter a dataframe based on condition in columns selected by name pattern


I have a dataframe that contains several columns

# Temp Data
library(dplyr)

df <- as.data.frame(matrix(seq(1:40),ncol=6,nrow=40))
colnames(df) <- c("A_logFC", "B_logFC", "C_logFC", "A_qvalue", "B_qvalue", "C_qvalue")

I would like to filter out those rows that have a qvalue lower than a threshold in all conditions (A, B, C).

I could do the obvious by filtering each column separately

df %>%
  filter(A_qvalue < 0.05 & B_qvalue < 0.05 & C_qvalue < 0.05)

but the real dataframe has 15 columns with q-values.

I also tried reshaping the dataframe (as found here)

df_ID = DEGs_df %>% mutate(ID = 1:n())

df_ID %>%
  select(contains("qval"), ID) %>% 
  gather(variable, value, -ID) %>% 
  filter(value < 0.05) %>%
  semi_join(df_ID)

but then I cannot filter for those rows whose q-value is below the threshold in all conditions.

Conceptually, it would be something like

df %>%
  filter(grep("q_value",.) < 0.05) 

but this does not work either.

Any suggestions on how to solve that? Thank you in advance!


Solution

  • You can filter multiple columns at once using if_all:

    library(dplyr)
    
    df %>%
      filter(if_all(matches("_qvalue"), ~ . < 0.05))
    

    In this case I use the filtering condition x < 0.05 on all columns whose name matches _qvalue.

    Your second approach can also work if you group by ID first and then use all inside filter:

    df_ID = df %>% mutate(ID = 1:n())
    
    df_ID %>%
      select(contains("qval"), ID) %>% 
      gather(variable, value, -ID) %>% 
      group_by(ID) %>% 
      filter(all(value < 0.05)) %>%
      semi_join(df_ID, by = "ID")