I have subsetted a set of data from the consumer complaints database. However, I'm having a hard time transforming it into a time series, especially since there are same issues being reported at the same time frame (not unique). My end goal is to compare the frequency of an issue against a time frame organized by month in a line plot.
Here are the first 5 rows of the subsetted data.frame
from a total of over 750,000 entries:
Date Issue
08/25/14 Making/receiving payments, sending money None
04/20/17 Other
02/14/14 Billing disputes
08/30/13 Managing the loan or lease
10/03/14 Billing disputes
01/07/13 Billing disputes
Something like this?
df <- data.frame(stringsAsFactors=FALSE,
Date = sample(c("08/25/14", "04/20/17", "02/14/14", "08/30/13", "10/03/2014",
"1/07/2013"), 100, replace = TRUE),
Issue = sample(c("Making/receiving", "Other", "Billing", "Managing", "Billing",
"Billing"), 100, replace = TRUE)
)
library(lubridate)
library(dplyr)
library(ggplot2)
df <- df %>%
mutate(
Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Period = make_date(Year, Month, 1)
) %>%
group_by(Period, Issue) %>%
summarise(
incidents = n()
)
ggplot() +
geom_path(data = df, mapping = aes(x = Period, y = incidents, colour = Issue))
Created on 2019-11-19 by the reprex package (v0.3.0)