Let's consider a df
with two columns word
and stem
. I want to create a new column that checks whether the value in stem
is entailed in word
and whether it is preceded or succeeded by some more characters. The final result should look like this:
WORD STEM NEW
rerun run prefixed
runner run suffixed
run run none
... ... ...
And below you can see my code so far. However, it does not work because the grepl
expression is applied on all rows of the df
. Anyways, I think it should make clear my idea.
df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
You can create the new
column like this
df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both')))
Or, in you are in a dplyr
pipeline and you want to avoid all the annoying df$
df %>%
mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both'))))
Output
# word stem new1
# 1 rerun run prefixed
# 2 runner run suffixed
# 3 run run none
# 4 aruna run both