Search code examples
rstringdataframeloopsstringr

Find values in data frame that don't contain specific string and replace them in R


Is there an elegant way to find values that don't contain specific strings/characters and replace them in R. I kinda want the oppositestr_replace_all, where instead of replacing the desired pattern I keep it and replace everything else.

I did create a for loop that works, but with over 15,000 rows in the real data set it's much too slow for any practical purpose.

Reprex of the for loop

Data frame. The goal will be to keep any value with the '-' character and omit the others:

df<- data.frame(x = c('cat', 'd-g', 'rat'),
            y = c('-water', 'air', 'earth'),
            Z = c('run', 'walk', 'jump-'))

Empty data frame to put values in at the end of the for loop

empty.df<- data.frame(x = NULL,
                  y = NULL,
                  z = NULL)

The loop

for(i in 1:ncol(df)){
  df_col <- df[,i]
  
  for(m in 1:length(df_col)){
    if(str_detect(df_col[m], '-|oov|usgs') == F %in% df_col){
      df_col[m] <- '.'
    }
  }
  empty.df<-rbind(empty.df, df_col)
 }

Here the first loop divides the column into separate vectors and assings them to the df_col object. Then the second loop goes through each of the values and if FALSE replaces them with ..

Result:

>     empty.df
>         X... X.d.g. X....1
>     1      .    d-g      .
>     2 -water      .      .
>     3      .      .  jump-

As mentioned earlier this is the desired result, but the run time is way to slow for practical use.


Solution

  • We could use str_detect to find any "-", in which case we leave data unchanged, or if not found, replace with "."

    library(dplyr)
    df |>
      mutate(across(everything(),
                    ~if_else(stringr::str_detect(.x, "-"),
                             .x, ".")))
    

    Result

        x      y     Z
    1   . -water     .
    2 d-g      .     .
    3   .      . jump-