I'm doing trend analysis, and trying to use barcharts to visualize the frequencies of the hashtags in different years. So I can see the top 3 most frequent hashtag terms, and see how the frequencies of these terms are evolving during years I have a dataset like this:
terms year
1 #A;#B;#C 2017
2 #B;#C;#D 2016
3 #C;#D;#E 2021
4 #D;#E;#F 2020
5 #E;#F;#G 2020
6 #F;#G;#H 2020
7 #G;#H;#I 2019
8 #H;#I;#J 2018
9 #I;#J;#K 2020
10 #J;#K;#L 2020
thanks!
Basically, we need to count the hashtag for every year. Since the hashtags for a particular year is in single-column we need to separate it into different columns and then we can convert the df into a long df, where it becomes possible for us to group it based on year and hashtag to find the count.
library(tidyverse)
structure(list(terms = c("#A;#B;#C", "#B;#C;#D", "#C;#D;#E",
"#D;#E;#F", "#E;#F;#G", "#F;#G;#H", "#G;#H;#I", "#H;#I;#J", "#I;#J;#K",
"#J;#K;#L"), year = c(2017, 2016, 2021, 2020, 2020, 2020, 2019,
2018, 2020, 2020)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) -> df
df %>%
separate(terms, into = paste0("t", 1:3), sep = ";") %>%
pivot_longer(-year) %>%
group_by(year, value) %>%
count(value) %>%
ggplot(aes(x = year, y = n, fill = value, label = n)) +
geom_col(position = position_dodge()) +
geom_text(position = position_dodge(1))
Created on 2021-02-05 by the reprex package (v0.3.0)