I have a some numeric variables in sets and their total in a separate variable. I am trying to suppress a cell below certain threshold value and round the next minimum value to nearest hundredth value so the masked value can't be reverse engineered.
My data looks like this.
id | Total | VarX1 | VarX2 | VarX3 | VarX4 | VarY1 | ........ VarZ12 |
---|---|---|---|---|---|---|---|
1 | 1500 | 25 | 400 | 800 | 275 | 180 | 60 |
2 | 1300 | 400 | 35 | 750 | 115 | 30 | 110 |
The masked data should look something like this
id | Total | VarX1 | VarX2 | VarX3 | VarX4 | VarY1 | ...VarZ12 |
---|---|---|---|---|---|---|---|
1 | 1500 | < 100 | 400 | 800 | <300 | 180 | <100 |
2 | 1200 | 400 | <100 | 750 | <200 | <100 | 110 |
I added two extra variables VarY1 and VarZ12 to explain the need of using mutate_at() function from dplyr as the variable numbers are pretty high but with a repeating string that can used within contains(). I am trying to make it reproducible as its a recurring task that I plan to do.
The approach that I am taking is first assigning NAs to the values less than 100 and then I trying to somehow index pmin value and round it to the nearest hundredth and convert it to character to add '<' while also replacing NAs with character '<100'.
I have gotten to the part of assigning NAs to values <100 but I have hit a major roadblock for replacing the next minimum value of the set of variables.
Here is the code that I have so far for replacing with NA to being with.
data %>%
mutate_at(vars(matches("VarsX"), ~ case_when(.x < 100 ~ NA,
T ~ .x))
What should I do next? Any direction or code help is highly appreciated. Thanks.
Using dplyr::rowwise()
and near()
to find minima:
library(dplyr)
data %>%
rowwise() %>%
mutate(
across(matches("Var"), \(x) ifelse(x < 100, NA, x)),
across(
matches("VarX"),
\(x) if_else(
near(x, min(c_across(matches("VarX")), na.rm = TRUE)),
paste0("<", 100 * ceiling(x / 100)),
as.character(x)
)
),
across(matches("Var"), \(x) if_else(is.na(x), "<100", as.character(x)))
) %>%
ungroup()
# A tibble: 2 × 8
id Total VarX1 VarX2 VarX3 VarX4 VarY1 VarZ12
<int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1500 <100 400 800 <300 180 <100
2 2 1300 400 <100 750 <200 <100 110
Note that mutate_at()
and friends have been superseded by mutate(across())
, as used above.