Search code examples
rreplacetidyversedata-cleaningdplyr

Replace column values with NA based on a different column or row position with tidyverse


Below is a smaller version of a much larger tibble I have and I want to replace the values in reflectanceSfp and in reflectanceDT to NA based on the values in the bandNumber column or based on their row position. I would like to solve this specifically with tidyverse pipes and associated packages.

 reflectanceSfp wavelength bandNumber reflectanceDT wavelength1
     -0.0113          376       1.00      0.000148         377
     -0.000592        381       2.00      0.00589          382
      0.0158          386       3.00      0.0101           387
      0.0200          391       4.00      0.0110           392
      0.0240          396       5.00      0.0117           397
      0.0265          401       6.00      0.0149           402

So I have the following bad bands list which are the band numbers that I would want to replace with NAs:

badBands <- c(1:2,6)

I have tried something in this format just to see what it would do

m2 <- myData %>%
  mutate(reflectanceSfp = case_when(bandNumber == 1.00 ~ NA ))

But, would ultimately want the bad bands vector to be in the pipe and have tried to understand the use of modify_at and mutate_at.

I would like the resulting dataset to look like

 reflectanceSfp wavelength bandNumber reflectanceDT wavelength1
          NA          376       1.00      0.000148         377
          NA          381       2.00      0.00589          382
      0.0158          386       3.00      0.0101           387
      0.0200          391       4.00      0.0110           392
      0.0240          396       5.00      0.0117           397
          NA          401       6.00      0.0149           402

Below is a dput version of my table:

myData <- structure(list(reflectanceSfp = c(-0.011258, -0.000592, 0.015815, 
0.019991, 0.023965, 0.026547), wavelength = c(376.440002, 381.450012, 
386.459991, 391.470001, 396.470001, 401.480011), bandNumber = c(1, 
2, 3, 4, 5, 6), reflectanceDT = c(0.00014819, 0.00589207, 0.01012335, 
0.01101705, 0.01165185, 0.01486412), wavelength1 = c(376.6300049, 
381.6400147, 386.6499939, 391.6600037, 396.6600037, 401.6700134
)), .Names = c("reflectanceSfp", "wavelength", "bandNumber", 
"reflectanceDT", "wavelength1"), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • As the 'badBands' have length greater than 1, use %in% instead of ==, also the case_when is type sensitive, so it is better to have the correct NA i.e. NA_real_ for the double column

    myData %>% 
        mutate(reflectanceSfp = case_when(bandNumber %in% badBands ~ NA_real_, 
                                       TRUE ~ reflectanceSfp))
    # A tibble: 6 x 5
    #  reflectanceSfp wavelength bandNumber reflectanceDT wavelength1
    #           <dbl>      <dbl>      <dbl>         <dbl>       <dbl>
    #1        NA            376.          1      0.000148        377.
    #2        NA            381.          2      0.00589         382.
    #3         0.0158       386.          3      0.0101          387.
    #4         0.0200       391.          4      0.0110          392.
    #5         0.0240       396.          5      0.0117          397.
    #6        NA            401.          6      0.0149          402.
    

    Or it is easier to use replace here, where we have to specify only the replacing value that satisfies the logical condition and without the type check

    myData %>%
          mutate(reflectanceSfp = replace(reflectanceSfp, 
                                     bandNumber %in% badBands, NA))