Search code examples
rggplot2text-miningtidytext

Plotting Bigrams in Bar Chart with ggplot2


My data looks like this:

> str(bigrams_joined)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   71319 obs. of  2 variables:
 $ line   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ bigrams: chr  "in practice" "practice risk" "risk management" "management is"

I would like to plot the top 10 or 15 most frequently occurring bigrams in my dataset to a bar chart in ggplot2 and have the bars running horizontally with the labels on the y-axis.

Any help with this is greatly appreciated!

Thank you


Solution

  • Looks like you need to count() your bigrams (from dplyr), and then you need to order them in your plot. For that these days, I prefer to use something like fct_reorder() from forcats.

    library(janeaustenr)
    library(tidyverse)
    library(tidytext)
    
    data_frame(txt = prideprejudice) %>%
        unnest_tokens(bigram, txt, token = "ngrams", n = 2) %>%
        count(bigram, sort = TRUE) %>%
        top_n(15) %>%
        ggplot(aes(fct_reorder(bigram, n), n)) +
        geom_col() +
        coord_flip() +
        labs(x = NULL)
    #> Selecting by n
    

    Created on 2018-04-22 by the reprex package (v0.2.0).