Search code examples
rggplot2cumsum

Bar plot colours do not match the cumulative sum of the names' frequency


I am creating a stacked bar chart showing the most frequent words by a person. I have managed to stack up all the tokens using a cumulative sum, however, I have two problems:

  1. the colour is not linked to the text in the bar chart for Don and I do not know why this happens since for Anna I get the right link between stocked up words and colour.

  2. The first word in Anna's column is an emoji. This does show up in my data frame as an emoji but it does not in the ggplot. Any idea on how to make ggplot able to show emojis?

This is a subset of my dataset:

    structure(list(person = c("Don", "Anna", "Anna", "Anna", "Anna", 
    "Don", "Anna", "Don", "Don", "Don"), tokens = c("hey", "\U0001f44d\U0001f3fc", 
    "im", "yeh", "xx", "https", "guys", "yeah", "guys", "im"), n = c(13L, 
    14L, 17L, 17L, 18L, 21L, 22L, 22L, 27L, 32L), freq = c(0.00727476217123671, 
    0.0149413020277481, 0.0181430096051227, 0.0181430096051227, 0.0192102454642476, 
    0.0117515388919978, 0.0234791889007471, 0.0123111359820929, 0.0151091214325686, 
    0.0179071068830442), cumcount = c(13L, 14L, 31L, 48L, 66L, 34L, 
    88L, 56L, 83L, 115L)), class = c("tbl_df", "tbl", "data.frame"
    ), row.names = c(NA, -10L))

this is the code:

plot2 %>% 
dplyr::filter(grepl("Anna|Don", person)) %>% 
group_by(person) %>%
arrange(n) %>% 
top_n(5,n) %>% 
mutate(cumcount=cumsum(n)) %>% 
ungroup() %>% 
ggplot(aes(x=person, y=n, fill=tokens, color=factor(tokens))) +
geom_col() +
geom_text(aes(y=cumcount, label=tokens), vjust=1.6, color="black", size=2.5)   
theme_minimal() +
theme(legend.position="none")
plot2 

enter image description here


Solution

  • I think you don't need to do all the rpe-processing part with calculated the cumulative sum because geom_col will do it for you.

    library(ggplot2)
    ggplot(DF, aes(x = person, y = n, fill = tokens))+
      geom_col(position = position_stack(reverse = TRUE))+
      geom_text(aes(label = tokens), 
                position = position_stack(vjust = 0.5, reverse = TRUE))
    

    enter image description here


    Regarding your emoji problem, I do not have any issues. Maybe you should check your version of R and ggplot2 mine are:

    R version 3.6.3 (2020-02-29)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Linux Mint 19.2
    
    Matrix products: default
    BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
     [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
    [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] ggplot2_3.2.1
    
    loaded via a namespace (and not attached):
     [1] Rcpp_1.0.3       digest_0.6.23    withr_2.1.2      assertthat_0.2.1 crayon_1.3.4    
     [6] dplyr_0.8.4      grid_3.6.3       R6_2.4.1         lifecycle_0.1.0  gtable_0.3.0    
    [11] magrittr_1.5     scales_1.1.0     pillar_1.4.3     rlang_0.4.4      farver_2.0.3    
    [16] lazyeval_0.2.2   rstudioapi_0.11  labeling_0.3     tools_3.6.3      glue_1.3.1      
    [21] purrr_0.3.3      munsell_0.5.0    compiler_3.6.3   pkgconfig_2.0.3  colorspace_1.4-1
    [26] tidyselect_1.0.0 tibble_2.1.3