Search code examples
rword-frequency

How do i remove a specific term in my dataframe string?


   df <- dataframe$Data %>%
      na.omit() %>%
      tolower() %>%
      strsplit(split = " ") %>% 
      unlist() %>%
      table() %>%
      sort(decreasing = TRUE)

Hey guys, im using these functions to get a list of word frequency (im working with a giant text), but im getting repeated words like "banana" , "banana.", "banana?" etc. and they are counting separately. How do i delete the dots, interrogation and others to sum banana correctly? Thx!!!


Solution

  • Try using :

    df <- dataframe$Data %>%
      na.omit() %>%
      tolower() %>%
      strsplit(split = " ") %>% 
      unlist() %>%
      gsub('[[:punct:]]', '', .) %>%
      table() %>%
      sort(decreasing = TRUE)