Search code examples
rdplyrmutate

Modifying a string using dplyr mutate() is returning the same value for all rows


I am trying to change the text in a column by removing a word in all rows.

For example, if a row contained "second row", I want to replace it with "second". Here is an example dataset

df <- data.frame(row_num = c(1,2,3,4,5),
                 text_long = c("first row", "second row", "third row", "fourth row", "fifth row"))

> df
#  row_num  text_long
#1       1  first row
#2       2 second row
#3       3  third row
#4       4 fourth row
#5       5  fifth row

How do I create a new column called "text_short" where the word row is removed from each of the text_long values?

I tried using the following function with mutate() in the dplyr package

library(dplyr)

shorten_text <- function(x) {
  return(unlist(strsplit(x, split=' '))[1])
}

df <- mutate(df, text_short = shorten_text(df$text_long))

But every row of text_short contains the same value:

>df
# row_num  text_long text_short
#1       1  first row      first
#2       2 second row      first
#3       3  third row      first
#4       4 fourth row      first
#5       5  fifth row      first

Solution

  • df %>%
      mutate(text_short = stringr::str_remove(text_long, " row"))
    

    or base R:

    df$text_short = gsub(" row", "", df$text_long)