Search code examples
rggplot2median

Plotting multiple medians in a single plot of panel data


I have a data set of 600 respondents. I have some indicator values for each of the 600 respondents across 5 years - 2013, 2014, 2015, 2016, 2017. Also, I have a city column for each respondent. I want to create a plot - where I plot the indicator for each of the 600 respondents using a line graph - one line graph for each respondent, such that Y-axis has indicator value and X-axis has years. I have separated the colors of line graphs by cities. Further, I want to add a separate median indicator line such that there is a median line for respondents for each city. I was able to create a consolidated median line but get an error if I try to plot multiple medians. Here is the code I am using -

library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)

sample_no <- c(1:600)
city <- c(rep("A",150), rep("B",250), rep("C", 200))
indicator_2013 <- runif(600, min=0, max=1000)
indicator_2014 <- runif(600, min=0, max=1000)
indicator_2015 <- runif(600, min=0, max=1000)
indicator_2016 <- runif(600, min=0, max=1000)
indicator_2017 <- runif(600, min=0, max=1000)

df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
df1 <- df %>%
  gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")

df1 %>%
  ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
  geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
  labs(col = "City") +
  stat_summary(aes(y = Indicator, group =1), fun.y=median, geom = "line", color = "black", size = 1)

Note: this is only dummy data so graphs are symmetric... I tried using the following code for making multiple median lines but I get the error - Error: Aesthetics must be either length 1 or the same as the data (5): colour, size

stat_summary(aes(y = Indicator, group =1), fun.y=median, colour=city, geom="line", size =1)

I looked around for documentation and other R blog posts but did not find something useful.


Solution

  • if i understood you correctly you just need to change group argument to the city and not 1:

    stat_summary(aes(y = Indicator, group =city)...
    

    Full code:

    library(ggplot2)
    library(dplyr)
    library(tidyr)
    library(magrittr)
    sample_no <- c(1:600)
    city <- c(rep("A",150), rep("B",250), rep("C", 200))
    indicator_2013 <- runif(600, min=0, max=1000)
    indicator_2014 <- runif(600, min=0, max=1000)
    indicator_2015 <- runif(600, min=0, max=1000)
    indicator_2016 <- runif(600, min=0, max=1000)
    indicator_2017 <- runif(600, min=0, max=1000)
    df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
    df1 <- df %>%
      gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")
    df1 %>%
      ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
      geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
      labs(col = "City") +
      stat_summary(aes(y = Indicator, group =city), fun.y=median, geom = "line", color = "black", size = 1)
    

    Additionally the color argument cannot be outside of the aes() if you use variable name such as column: city, here is the correcty way if you wanna have the lines coloured by the city:

    stat_summary(aes(y = Indicator, group =city, color = city), fun.y=median, geom="line", size =1)
    

    [ANSWER TO QUESTION IN THE COMMENT]

    Here is the full code:

    library(ggplot2)
    library(dplyr)
    library(tidyr)
    library(magrittr)
    sample_no <- c(1:600)
    city <- c(rep("A",150), rep("B",250), rep("C", 200))
    indicator_2013 <- runif(600, min=0, max=1000)
    indicator_2014 <- runif(600, min=0, max=1000)
    indicator_2015 <- runif(600, min=0, max=1000)
    indicator_2016 <- runif(600, min=0, max=1000)
    indicator_2017 <- runif(600, min=0, max=1000)
    df <- data.frame(sample_no, city, indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017)
    df1 <- df %>%
      gather(indicator_2013, indicator_2014, indicator_2015, indicator_2016, indicator_2017, key="Year", value = "Indicator")
    df1 %>%
      ggplot(aes(x=Year, y=Indicator, color=as.factor(city))) +
      geom_line(aes(group = sample_no), alpha = .5, size = 0.7) +
      labs(col = "City") +
      stat_summary(aes(y = Indicator, group =city), fun.y=median, geom = "line", color = "black", size = 1) + scale_x_discrete(expand=c(0,0)) 
    

    You just need to add scale_x_discrete(expand=c(0,0)) to remove the spaces and start x axis from the first factor level.