df <- dataframe$Data %>%
na.omit() %>%
tolower() %>%
strsplit(split = " ") %>%
unlist() %>%
table() %>%
sort(decreasing = TRUE)
Hey guys, im using these functions to get a list of word frequency (im working with a giant text), but im getting repeated words like "banana" , "banana.", "banana?" etc. and they are counting separately. How do i delete the dots, interrogation and others to sum banana correctly? Thx!!!
Try using :
df <- dataframe$Data %>%
na.omit() %>%
tolower() %>%
strsplit(split = " ") %>%
unlist() %>%
gsub('[[:punct:]]', '', .) %>%
table() %>%
sort(decreasing = TRUE)