This is the kind of data I have:
Date | Station | Param1 | Param2 |
---|---|---|---|
2020-01-01 | A | <5 | 45 |
2020-02-01 | B | <5 | 47 |
To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.
I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:
Date | Station | Param1_org | Param1_new | Param1_loq | Param2_org | Param2_new | Param2_loq |
---|---|---|---|---|---|---|---|
2020-01-01 | A | <5 | 5 | < | 45 | 45 | = |
2020-02-01 | B | <5 | 5 | < | 47 | 47 | = |
I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.
I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!
Here's the answer of your question
library(tidyverse)
data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
Station = c("A", "B"),
Param1 = c("<5", "<5"),
Param2 = c("45", "47"))
cols <- colnames(data)
param_cols <- cols[str_detect(cols, "^Param")]
for (col in param_cols) {
col_name <- paste(col, "org", sep = "_")
col_new<- paste(col, "new", sep = "_")
col_loq <- paste(col, "loq", sep = "_")
data <-data %>%
mutate(!!col_name := get(col),
!!col_new := str_extract(get(col), "\\d+"),
!!col_loq := ifelse(str_detect(get(col), "^\\d"),
"=",
ifelse(str_detect(get(col), "^<"), "<", ">")
),
!!col := NULL
)
}
print(data)
What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !!
is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)