Search code examples
rdataframeconditional-statements

Set a conditional divide function with NA values present


I have a snippet of a larger data set that I am trying convert to the same unit of measurement. I can convert the columns "unit1" and "legal1" to "ppm" units just fine, but I run into trouble when trying to convert "unit2" and "legal2" to "ppm" units.

When the unit is "ppb" in the corresponding columns, I need the values in "legal" columns to divide by 1000.

Here is my progress so far:

contaminants <- c("Barium", "Magnanese", "Nitrate", "Nitrate & nitrite")
unit1 <- c("ppb", "ppb", "ppm", "ppm")
legal1 <- c(2000, 50, 10, 10)
legal2 <- c(2, 999999, 10, NA)
unit2 <- c("ppm", "ppb", "ppm", NA)

testdf = data.frame(contaminants, unit1, legal1, unit2, legal2)
testdf

I can successfully convert the "unit1" and "legal1" to "ppm" units with

testdf$legal1[testdf$unit1 == "ppb"] <- (testdf$legal1)/1000
testdf$unit1[testdf$unit1 == "ppb"] <- "ppm"

When I try to run the same code for "unit2" and "legal2"

testdf$legal2[testdf$unit2 == "ppb"] <- (testdf$legal2)/1000

I get the following error: "NAs are not allowed in subscripted assignments"

In this case, only the row with 'Magnanese' would need to be divided by 1000, but that's not happening.


Solution

  • Here is another tidyverse solution. The issue is cause by the NA we could address it by !is.na(.) within across. The challenge is to fulfill the condition to be ppb to calculate:

    library(dplyr)
    library(stringr)
    
    testdf %>%
      mutate(across(starts_with("legal"), 
                    ~ if_else(get(str_replace(cur_column(), "legal", "unit")) == "ppb" & !is.na(.), . / 1000, .),
                    .names = "{.col}"),
             across(starts_with("unit"), ~ifelse(. == "ppb", "ppm", .)))
    

    OR in base R with a custom function:

    # custom function 
    transform_fun <- function(df, legal_col, unit_col) {
      ifelse(df[[unit_col]] == "ppb" & !is.na(df[[legal_col]]), df[[legal_col]] / 1000, df[[legal_col]])
    }
    
    
    testdf$legal1 <- transform_fun(testdf, "legal1", "unit1")
    testdf$unit1 <- ifelse(testdf$unit1 == "ppb", "ppm", testdf$unit1)
    
    testdf$legal2 <- transform_fun(testdf, "legal2", "unit2")
    testdf$unit2 <- ifelse(testdf$unit2 == "ppb", "ppm", testdf$unit2)
    
    testdf
    
           contaminants unit1 legal1 unit2  legal2
    1            Barium   ppm   2.00   ppm   2.000
    2         Magnanese   ppm   0.05   ppm 999.999
    3           Nitrate   ppm  10.00   ppm  10.000
    4 Nitrate & nitrite   ppm  10.00  <NA>      NA