Search code examples
loopsdplyrapplydata-cleaningmutate

Looping over data frame to "clean" data


This is the kind of data I have:

Date Station Param1 Param2
2020-01-01 A <5 45
2020-02-01 B <5 47

To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.

I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:

Date Station Param1_org Param1_new Param1_loq Param2_org Param2_new Param2_loq
2020-01-01 A <5 5 < 45 45 =
2020-02-01 B <5 5 < 47 47 =

I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.

I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!


Solution

  • Here's the answer of your question

    library(tidyverse)
    
    data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
                      Station = c("A", "B"), 
                      Param1 = c("<5", "<5"),
                      Param2 = c("45", "47"))
    
    cols <- colnames(data)
    param_cols <- cols[str_detect(cols, "^Param")]
    
    
    for (col in param_cols) {
      col_name <- paste(col, "org", sep = "_")
      col_new<- paste(col, "new", sep = "_")
      col_loq <- paste(col, "loq", sep = "_")
      data <-data %>% 
        mutate(!!col_name := get(col), 
               !!col_new := str_extract(get(col), "\\d+"),
               !!col_loq := ifelse(str_detect(get(col), "^\\d"), 
                                   "=", 
                                   ifelse(str_detect(get(col), "^<"), "<", ">")
                                   ),
               !!col := NULL
               )
    }
    
    print(data)
    

    enter image description here

    What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !! is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)