Search code examples
rhashtag

How to visualize hashtags in R, and see the trends of the hashtags?


I'm doing trend analysis, and trying to use barcharts to visualize the frequencies of the hashtags in different years. So I can see the top 3 most frequent hashtag terms, and see how the frequencies of these terms are evolving during years I have a dataset like this:

    terms          year
1   #A;#B;#C       2017
2   #B;#C;#D       2016
3   #C;#D;#E       2021
4   #D;#E;#F       2020
5   #E;#F;#G       2020
6   #F;#G;#H       2020
7   #G;#H;#I       2019
8   #H;#I;#J       2018
9   #I;#J;#K       2020
10  #J;#K;#L       2020

thanks!


Solution

  • Basically, we need to count the hashtag for every year. Since the hashtags for a particular year is in single-column we need to separate it into different columns and then we can convert the df into a long df, where it becomes possible for us to group it based on year and hashtag to find the count.

    library(tidyverse)
    
    structure(list(terms = c("#A;#B;#C", "#B;#C;#D", "#C;#D;#E", 
                             "#D;#E;#F", "#E;#F;#G", "#F;#G;#H", "#G;#H;#I", "#H;#I;#J", "#I;#J;#K", 
                             "#J;#K;#L"), year = c(2017, 2016, 2021, 2020, 2020, 2020, 2019, 
                                                   2018, 2020, 2020)), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                          "tbl", "data.frame")) -> df
    
    df %>% 
       separate(terms, into = paste0("t", 1:3), sep = ";") %>% 
       pivot_longer(-year) %>% 
       group_by(year, value) %>% 
       count(value) %>% 
       ggplot(aes(x = year, y = n, fill = value, label = n)) +
       geom_col(position = position_dodge()) +
       geom_text(position = position_dodge(1))
    

    Created on 2021-02-05 by the reprex package (v0.3.0)