Search code examples
rmergesplitbioconductor

Keep rows with valus in specific columns?


I have a table like below that is merged of multiple data, I woild like to keep only rows that present value in "-log10.qvalue" columns .

input:

chr_enhancer Start_enhancer End_enhancer                     source_enhancer.x                      source_marks.x -log10.qvalue.x                     source_enhancer.y -log10.qvalue.y                       source_enhancer -log10.qvalue
       chr1      100036100    100036650 Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge                                .               .   Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge               . Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge             .
       chr1      100042226    100043575 Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge Fetalbrain.H3K4me1_narrow_peak_2501         2.145 Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge              .    Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge             1.254
       chr1      100042310    100043300            enhancer_atlas_Fetal_brain Fetalbrain.H3K4me1_narrow_peak_2501         5.145        enhancer_atlas_Fetal_brain               1.356            enhancer_atlas_Fetal_brain             6.325 

output:

 chr1      100042310    100043300            enhancer_atlas_Fetal_brain Fetalbrain.H3K4me1_narrow_peak_2501         5.145        enhancer_atlas_Fetal_brain               1.356            enhancer_atlas_Fetal_brain             6.325 

Solution

  • First get all columns start with log10. using 'grepl', then check and sum the . or na in each row using apply.
    Finally, return rows where sum(x=='.'| is.na(x))==0

    df[apply(df[,grepl('log10.',names(df))],1,function(x)sum(x=='.'| is.na(x))==0),]
    
    
    chr_enhancer Start_enhancer End_enhancer   source_enhancer.x     source_marks.x X.log10.qvalue.x
    3         chr1      100042310    100043300 enhancer_atlas_Fetal_brain Fetalbrain.H3K4me1_narrow_peak_2501            5.145  
    source_enhancer.y X.log10.qvalue.y            source_enhancer X.log10.qvalue
    3 enhancer_atlas_Fetal_brain            1.356 enhancer_atlas_Fetal_brain          6.325
    

    Data

     df <- read.table(text="chr_enhancer Start_enhancer End_enhancer                     source_enhancer.x                      source_marks.x -log10.qvalue.x                     source_enhancer.y -log10.qvalue.y                       source_enhancer -log10.qvalue
           chr1      100036100    100036650 Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge                                .               .   Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge               . Reilly_Hu_12Opcw_H3K4me2_rep1_2_merge             .
           chr1      100042226    100043575 Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge Fetalbrain.H3K4me1_narrow_peak_2501         2.145 Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge              .    Reilly_Hu_12Fpcw_H3K4me2_rep1_2_merge             1.254
           chr1      100042310    100043300            enhancer_atlas_Fetal_brain Fetalbrain.H3K4me1_narrow_peak_2501         5.145        enhancer_atlas_Fetal_brain               1.356            enhancer_atlas_Fetal_brain             6.325 
                     ",header=T)