Search code examples
rstringdataframegreplstartswith

Create a Column Based on Another Column Using grepl


Let's consider a df with two columns word and stem. I want to create a new column that checks whether the value in stem is entailed in word and whether it is preceded or succeeded by some more characters. The final result should look like this:

WORD     STEM     NEW
rerun    run      prefixed
runner   run      suffixed
run      run      none
...      ...      ...

And below you can see my code so far. However, it does not work because the grepl expression is applied on all rows of the df. Anyways, I think it should make clear my idea.

df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
             ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
                ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))

Solution

  • You can create the new column like this

    df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
                     ifelse(startsWith(df$word, df$stem), 'suffixed',
                            ifelse(endsWith(df$word, df$stem), 'prefixed',
                                   'both')))
    

    Or, in you are in a dplyr pipeline and you want to avoid all the annoying df$

    df %>%
      mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
                          ifelse(startsWith(df$word, df$stem), 'suffixed',
                                 ifelse(endsWith(df$word, df$stem), 'prefixed',
                                        'both'))))
    

    Output

    #       word stem     new1
    # 1    rerun  run prefixed
    # 2   runner  run suffixed
    # 3      run  run     none
    # 4    aruna  run     both