Search code examples
rdataframedplyrdata-manipulation

Indexing values with a pmin for rounding to nearest hundredth value in mutate_at dplyr


I have a some numeric variables in sets and their total in a separate variable. I am trying to suppress a cell below certain threshold value and round the next minimum value to nearest hundredth value so the masked value can't be reverse engineered.

My data looks like this.

id Total VarX1 VarX2 VarX3 VarX4 VarY1 ........    VarZ12
1 1500 25 400 800 275 180 60
2 1300 400 35 750 115 30 110

The masked data should look something like this

id Total VarX1 VarX2 VarX3 VarX4 VarY1 ...VarZ12
1 1500 < 100 400 800 <300 180 <100
2 1200 400 <100 750 <200 <100 110

I added two extra variables VarY1 and VarZ12 to explain the need of using mutate_at() function from dplyr as the variable numbers are pretty high but with a repeating string that can used within contains(). I am trying to make it reproducible as its a recurring task that I plan to do.

The approach that I am taking is first assigning NAs to the values less than 100 and then I trying to somehow index pmin value and round it to the nearest hundredth and convert it to character to add '<' while also replacing NAs with character '<100'.

I have gotten to the part of assigning NAs to values <100 but I have hit a major roadblock for replacing the next minimum value of the set of variables.

Here is the code that I have so far for replacing with NA to being with.

data %>%
mutate_at(vars(matches("VarsX"), ~ case_when(.x < 100 ~ NA,
                                             T ~ .x)) 

What should I do next? Any direction or code help is highly appreciated. Thanks.


Solution

  • Using dplyr::rowwise() and near() to find minima:

    library(dplyr)
    
    data %>%
      rowwise() %>%
      mutate(
        across(matches("Var"), \(x) ifelse(x < 100, NA, x)), 
        across(
          matches("VarX"),
          \(x) if_else(
            near(x, min(c_across(matches("VarX")), na.rm = TRUE)),
            paste0("<", 100 * ceiling(x / 100)),
            as.character(x)
          )
        ),
        across(matches("Var"), \(x) if_else(is.na(x), "<100", as.character(x)))
      ) %>%
      ungroup()
    
    # A tibble: 2 × 8
         id Total VarX1 VarX2 VarX3 VarX4 VarY1 VarZ12
      <int> <int> <chr> <chr> <chr> <chr> <chr> <chr> 
    1     1  1500 <100  400   800   <300  180   <100  
    2     2  1300 400   <100  750   <200  <100  110   
    

    Note that mutate_at() and friends have been superseded by mutate(across()), as used above.