I am creating a stacked bar chart showing the most frequent words by a person. I have managed to stack up all the tokens using a cumulative sum, however, I have two problems:
the colour is not linked to the text in the bar chart for Don and I do not know why this happens since for Anna I get the right link between stocked up words and colour.
The first word in Anna's column is an emoji. This does show up in my data frame as an emoji but it does not in the ggplot. Any idea on how to make ggplot able to show emojis?
This is a subset of my dataset:
structure(list(person = c("Don", "Anna", "Anna", "Anna", "Anna",
"Don", "Anna", "Don", "Don", "Don"), tokens = c("hey", "\U0001f44d\U0001f3fc",
"im", "yeh", "xx", "https", "guys", "yeah", "guys", "im"), n = c(13L,
14L, 17L, 17L, 18L, 21L, 22L, 22L, 27L, 32L), freq = c(0.00727476217123671,
0.0149413020277481, 0.0181430096051227, 0.0181430096051227, 0.0192102454642476,
0.0117515388919978, 0.0234791889007471, 0.0123111359820929, 0.0151091214325686,
0.0179071068830442), cumcount = c(13L, 14L, 31L, 48L, 66L, 34L,
88L, 56L, 83L, 115L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
this is the code:
plot2 %>%
dplyr::filter(grepl("Anna|Don", person)) %>%
group_by(person) %>%
arrange(n) %>%
top_n(5,n) %>%
mutate(cumcount=cumsum(n)) %>%
ungroup() %>%
ggplot(aes(x=person, y=n, fill=tokens, color=factor(tokens))) +
geom_col() +
geom_text(aes(y=cumcount, label=tokens), vjust=1.6, color="black", size=2.5)
theme_minimal() +
theme(legend.position="none")
plot2
I think you don't need to do all the rpe-processing part with calculated the cumulative sum because geom_col
will do it for you.
library(ggplot2)
ggplot(DF, aes(x = person, y = n, fill = tokens))+
geom_col(position = position_stack(reverse = TRUE))+
geom_text(aes(label = tokens),
position = position_stack(vjust = 0.5, reverse = TRUE))
Regarding your emoji problem, I do not have any issues. Maybe you should check your version of R and ggplot2
mine are:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.2
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 digest_0.6.23 withr_2.1.2 assertthat_0.2.1 crayon_1.3.4
[6] dplyr_0.8.4 grid_3.6.3 R6_2.4.1 lifecycle_0.1.0 gtable_0.3.0
[11] magrittr_1.5 scales_1.1.0 pillar_1.4.3 rlang_0.4.4 farver_2.0.3
[16] lazyeval_0.2.2 rstudioapi_0.11 labeling_0.3 tools_3.6.3 glue_1.3.1
[21] purrr_0.3.3 munsell_0.5.0 compiler_3.6.3 pkgconfig_2.0.3 colorspace_1.4-1
[26] tidyselect_1.0.0 tibble_2.1.3