I have a very large data frame where each row in the first column represents an id with numbers. The other rows have a categorical variable that can be of two types (in this example, A or B), each for a year. Here's a simplified data frame as an example:
id var2017 var2018 var2019
1 A B A
2 B A A
3 B A B
4 A A A
5 A B B
I'd like to create a bar plot that contains the count of each type (A and B) for each year, with the bars being grouped by type. I am new with R language, so I've tried to create a plot for the years separately, which works fine, as follows:
graph <– ggplot(data = example) +
geom_bar(aes(x = var2017))
The problem is I don't know how to put them all together. How can I create a plot with all the types for each year being in the x axis, and the count in the y axis? The id doesn't need to be in the output.
The way to plot multiple columns in ggplot is to first convert the data to long form, which can be done with tidyr::gather
. Then you map the column it came from (now stored in the "year" column) to one aesthetic, and the count to another (geom_bar
does this for you by counting the number of rows).
library(tidyverse);
ggplot(data = example %>%
gather(year, type, -id)) +
geom_bar(aes(x = year, fill = type), position = "dodge")
(Note, I changed the example to make the different years have different counts. Otherwise it's less clear to see if it's working.)
example <- read.table(
header = T,
stringsAsFactors = F,
text = "id var2017 var2018 var2019
1 A B A
2 B A A
3 B A B
4 B A A # var2017 A changed to B
5 A B B")