Search code examples
rstringdataframetext-miningtweets

R Count Frequency of Custom Dictionary in a Dataframe Column but Group them


I have a task, which is too complex for my R-knowledge. I have a dataframe with Tweets-data, including a column that consists of the usernames, data of the Tweet and the content of the Tweet. It looks like this: Datastructure

I have dictionaries of words like:

dict <- c("one", "two", "eleven")

I want to count the frequency of the words used within their tweets, but I want to group them by year and name.

I count the frequency by using this:

freq_auth <- tweetsanalysis1 %>% mutate(authority_dic = str_c(str_extract(text, str_c(authority_dic, collapse = '|')))) %>% count(authority_dic, name = 'freq_word') %>% arrange(desc(freq_word))

It works just like it should:

Output

But it counts for all names and dates. How do I count the frequency for every different name on their own and split it by year? I want to analyse each individual name on its word frequency and afterwards add the name and date of the tweet to the output.

Maybe cut the dataframe into tiny pieces by each name within a year and then run the analysis on each name? My dataset contains 30k observations and over 200 individual names, so that would take a lot of time.

I hope I was able to get my point across. If not, just ask me. :) It would be greatly welcomed if someone would help me! Thanks in advance.


Solution

  • Try group_by() and summarise() and you can spread() after to create a column for each year.

    See if this works for your:

    freq_auth <- tweetsanalysis1 %>%
            mutate(authority_dic =str_c(str_extract(text, str_c(authority_dic, collapse = '|')))) %>%
            group_by(authority_dic, year, user_username) %>%
            summarise(freq_word = n()) %>% 
            arrange(desc(freq_word)) %>%
            spread(year, freq_word)