Search code examples
rdataframedplyrfiltersubset

Subset the rows in a dataframe that match multiple conditions


I have a dataframe like below:

dput(trans_eqtl[1:3,1:10])
structure(list(Gene = c("ENSG00000132819", "ENSG00000101162", 
"ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426, 
57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"
), `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280, 
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333, 
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407
), De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA, 
3L), class = "data.frame")

I am trying to keep those rows only for whose columns match the following condition:

I want to filter snps based on its position: if SNP position is more than De_cismb or less than Ds_cismb consider it and add to the table trans_snp.

I tried this code but it doesn't give me the right subset:

##check for trans_Snp

trans_snp <- NULL
for(i in 1:dim(trans_eqtl)[1]){
  if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
    x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
    y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
    value <- trans_eqtl[x,]
    value <- trans_eqtl[y,]
  


  }

  trans_snp <- rbind(trans_snp,value)
}

This is the output dataframe that I am getting:

dput(trans_snp[1:4,1:10])
structure(list(Gene = c("ENSG00000132819", "ENSG00000132819", 
"ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20, 
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426
), RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"
), `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072, 
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407
), end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407, 
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333, 
58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")

Its only filled with the first value of the input dataframe. Does anyone know where I am making the mistake.


Solution

  • In dplyr:

    library(dplyr)
    
    trans_eqtl %>%
      filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp