I have a data set of 600 respondents. I have some indicator values for each of the 600 respondents across 5 years - 2013, 2014, 2015, 2016, 2017. Also, I have a city column for each respondent. I want to create a plot - where I plot the indicator for each of the 600 respondents using a line graph - one line graph for each respondent, such that Y-axis has indicator value and X-axis has years. I have separated the colors of line graphs by cities. Further, I want to add a separate median indicator line such that there is a median line for respondents for each city. I was able to create a consolidated median line but get an error if I try to plot multiple medians. Here is the code I am using -
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
sample_no <- c(1:600)
city <- c(rep("A",150), rep("B",250), rep("C", 200))
indicator_2013 <- runif(600, min=0, max=1000)
indicator_2014 <- runif(600, min=0, max=1000)
indicator_2015 <- runif(600, min=0, max=1000)
indicator_2016 <- runif(600, min=0, max=1000)
indicator_2017 <- runif(600, min=0, max=1000)
df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
df1 <- df %>%
gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")
df1 %>%
ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
labs(col = "City") +
stat_summary(aes(y = Indicator, group =1), fun.y=median, geom = "line", color = "black", size = 1)
Note: this is only dummy data so graphs are symmetric... I tried using the following code for making multiple median lines but I get the error - Error: Aesthetics must be either length 1 or the same as the data (5): colour, size
stat_summary(aes(y = Indicator, group =1), fun.y=median, colour=city, geom="line", size =1)
I looked around for documentation and other R blog posts but did not find something useful.
if i understood you correctly you just need to change group
argument to the city and not 1:
stat_summary(aes(y = Indicator, group =city)...
Full code:
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
sample_no <- c(1:600)
city <- c(rep("A",150), rep("B",250), rep("C", 200))
indicator_2013 <- runif(600, min=0, max=1000)
indicator_2014 <- runif(600, min=0, max=1000)
indicator_2015 <- runif(600, min=0, max=1000)
indicator_2016 <- runif(600, min=0, max=1000)
indicator_2017 <- runif(600, min=0, max=1000)
df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
df1 <- df %>%
gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")
df1 %>%
ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
labs(col = "City") +
stat_summary(aes(y = Indicator, group =city), fun.y=median, geom = "line", color = "black", size = 1)
Additionally the color argument cannot be outside of the aes()
if you use variable name such as column: city, here is the correcty way if you wanna have the lines coloured by the city:
stat_summary(aes(y = Indicator, group =city, color = city), fun.y=median, geom="line", size =1)
[ANSWER TO QUESTION IN THE COMMENT]
Here is the full code:
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
sample_no <- c(1:600)
city <- c(rep("A",150), rep("B",250), rep("C", 200))
indicator_2013 <- runif(600, min=0, max=1000)
indicator_2014 <- runif(600, min=0, max=1000)
indicator_2015 <- runif(600, min=0, max=1000)
indicator_2016 <- runif(600, min=0, max=1000)
indicator_2017 <- runif(600, min=0, max=1000)
df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
df1 <- df %>%
gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")
df1 %>%
ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
labs(col = "City") +
stat_summary(aes(y = Indicator, group =city), fun.y=median, geom = "line", color = "black", size = 1) + scale_x_discrete(expand=c(0,0))
You just need to add scale_x_discrete(expand=c(0,0))
to remove the spaces and start x axis from the first factor level.