I have a dataframe including the various demographic and economic data of every county in the United States. With success I have summarised this data to give me a total for each state. I'm using only particular states in my analysis, outlined below. Data Frame
I'm looking to create a bar graph the gender split of each state(how many men and women in each state). I've attempted the following code and received this output:
p1 <- ggplot(MW_15, aes(y="2015 Pop", x=State)) + geom_bar(position="fill", stat="identity")
p1 + ylab("Population")
Is it the formatting of my data, or the code I'm using (most likely a combination of both) that is stopping me from getting a sensical result?
First. It's easier to answer when you put a snippet of your data in your post as @RuiBarrads already suggested. Second, when using awkward var names like "2015 Pop" in aes you have to put them in backticks not double quotes. Otherwise ggplot2 will not treat them as the name of a variable. Third. To plot the population size or shares by gender you have to convert your df to long format using e.g. tidyr::pivot_longer
. This way male and female become categories of one var which we can map on the fill
aesthetic. Try this
library(dplyr)
library(tidyr)
library(ggplot2)
p1 <- tidyr::pivot_longer(MW_15, -c("State", "2015 Pop"), names_to = "gender", values_to = "num") %>%
ggplot(aes(x=State, y=num, fill = gender)) +
geom_bar(position="fill", stat="identity")
p1 + ylab("Population")