Search code examples
rtextnlpspeechdialect

Change words starting with pattern


I am analysing political speech and want to standardize some dialect words. I want to change all words starting with "fra" so that they start with "fre".

Example:

"frad walked into a bar" becomes "fred walked into a bar"

"are you frad" becomes "are you fred"

"are you afraid" should not change, and stay the same

How do I do this in R?

The speeches are stored in a data frame together with some metadata, where the variable text stores speech for each politician within a year.


Solution

  • What you are lookin for are called regular expressions:

    text <- c("frad walked into a bar", "are you frad", "are you afraid")
    
    gsub("\\bfra", "fre", text)
    #> [1] "fred walked into a bar" "are you fred"           "are you afraid"
    

    In this case, the \\b means beginning or end of word. You can use this cheat sheet to learn more or find another good resource.