Search code examples
rnagsubcoercion

Is there a way to identify where NAs are introduced?


Recently went through my fairly large dataset and realized some foo decided to use commas. Trying to convert it all to numeric. Used a nice little gsub to get rid of those pesky commas, but I'm still finding NAs introduced by coercion. Is there a way to identify the location by column and row where those NAs are being introduced so I can see why that is occurring?


Solution

  • Use the is.na() function. Consider the following data frame, which contains NA values, as an example:

    > df <- data.frame(v1=c(1,2,NA,4), v2=c(NA,6,7,8), v3=c(9,NA,NA,12))
    > df
      v1 v2 v3
    1  1 NA  9
    2  2  6 NA
    3 NA  7 NA
    4  4  8 12
    

    You can use is.na along with sapply to get the following result:

    > sapply(df, function(x) { c(1:length(x))[is.na(x)] })
    $v1
    [1] 3
    
    $v2
    [1] 1
    
    $v3
    [1] 2 3
    

    Each column will come back along with the rows where NA values occurred.