Search code examples
rdataframecountmissing-data

How to count missing values from two columns in R


I have a data frame which looks like this

**Contig_A**    **Contig_B**   

  Contig_0        Contig_1 
  Contig_3        Contig_5
  Contig_4        Contig_1
  Contig_9        Contig_0
  

I want to count how many contig ids (from Contig_0 to Contig_1193) are not present in either Contig_A column of Contig_B.

For example: if we consider there are total 10 contigs here for this data frame (Contig_0 to Contig_9), then the answer would be 4 (Contig_2, Contig_6, Contig_7, Contig_8)


Solution

  • Create a vector of all the values that you want to check (all_contig) which is Contig_0 to Contig_10 here. Use setdiff to find the absent values and length to get the count of missing values.

    cols <- c('Contig_A', 'Contig_B')
    #If there are lot of 'Contig' columns that you want to consider
    #cols <- grep('Contig', names(df), value = TRUE)
    
    all_contig <- paste0('Contig_', 0:10)
    
    missing_contig <- setdiff(all_contig, unlist(df[cols]))
    #[1] "Contig_2"  "Contig_6"  "Contig_7"  "Contig_8"  "Contig_10"
    
    count_missing <- length(missing_contig)
    #[1] 5