My data looks like this:
> str(bigrams_joined)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 71319 obs. of 2 variables:
$ line : int 1 1 1 1 1 1 1 1 1 1 ...
$ bigrams: chr "in practice" "practice risk" "risk management" "management is"
I would like to plot the top 10 or 15 most frequently occurring bigrams in my dataset to a bar chart in ggplot2 and have the bars running horizontally with the labels on the y-axis.
Any help with this is greatly appreciated!
Thank you
Looks like you need to count()
your bigrams (from dplyr), and then you need to order them in your plot. For that these days, I prefer to use something like fct_reorder()
from forcats.
library(janeaustenr)
library(tidyverse)
library(tidytext)
data_frame(txt = prideprejudice) %>%
unnest_tokens(bigram, txt, token = "ngrams", n = 2) %>%
count(bigram, sort = TRUE) %>%
top_n(15) %>%
ggplot(aes(fct_reorder(bigram, n), n)) +
geom_col() +
coord_flip() +
labs(x = NULL)
#> Selecting by n
Created on 2018-04-22 by the reprex package (v0.2.0).