How to extract month from column

I'd like to create a plot from the Textmining with R web textbook, but with my data. It essentially searches for the top terms per year and graphs them (Figure 5.4: http://tidytextmining.com/dtm.html). My data is a bit cleaner than the one they started with, but I'm new to R. My data has a "Date" column that is in 2016-01-01 format (it's a date class). I only have data from 2016, so I want to do the same thing, but more granular, (i.e. by month or by day)

library(tidyr)

year_term_counts <- inaug_td %>%
extract(document, "year", "(\\d+)", convert = TRUE) %>%
complete(year, term, fill = list(count = 0)) %>%
group_by(year) %>%
mutate(year_total = sum(count))

year_term_counts %>%
filter(term %in% c("god", "america", "foreign", "union", "constitution", 
"freedom")) %>%
ggplot(aes(year, count / year_total)) +
geom_point() +
geom_smooth() +
facet_wrap(~ term, scales = "free_y") +
scale_y_continuous(labels = scales::percent_format()) +
ylab("% frequency of word in inaugural address")

The idea is that I would chose my specific words from my text and see how they change over the months.

Thank you!

Solution

If you want look at smaller units of time, based on a date column that you already have, I would recommend looking at the floor_date() or round_date() function from lubridate. The particular chapter of our book you linked to deals with taking a document-term matrix and then tidying it, etc. Have you already gotten to a tidy text format for your data? If so, then you could do something like this:

date_counts <- tidy_text %>%
    mutate(date = floor_date(Date, unit = "7 days")) %>% # use whatever time unit you want here
    count(date, word) %>%
    group_by(date) %>%
    mutate(date_total = sum(n))

date_counts %>%
    filter(word %in% c("PUT YOUR LIST OF WORDS HERE")) %>%
    ggplot(aes(date, n / date_total)) +
    geom_point() +
    geom_smooth() +
    facet_wrap(~ word, scales = "free_y")