Search code examples
rfor-loopggplot2ggplotly

Why does R behave differently when parsing parameters of plotting?


I am attempting to plot multiple time series variables on a single line chart using ggplot. I am using a data.frame which contains n time series variables, and a column of time periods. Essentially, I want to loop through the data.frame, and add exactly n goem_lines to a single chart.

Initially I tried using the following code, where;

  • df = data.frame containing n time series variables, and 1 column of time periods
  • wid = n (number of time series variables)
  p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))  
  for (i in 1:wid) {
    p <- p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
  } 
  ggplotly(p)

However, this only produces a plot of the final time series variable in the data.frame. I then investigated further, and found that following sets of code produce completely different results:

p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))
i = 1
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 2
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 3
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
  ggplotly(p)

Plot produced by code above

p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))
p = p + geom_line(aes(x=df$Time, y=df[,1], color=var.lab[1]))
p = p + geom_line(aes(x=df$Time, y=df[,2], color=var.lab[2]))
p = p + geom_line(aes(x=df$Time, y=df[,3], color=var.lab[3]))
  ggplotly(p)

Plot produced by code above

In my mind, these two sets of code are identical, so could anyone explain why they produce such different results?

I know this could probably be done quite easily using autoplot, but I am more interested in the behavior of these two snipits of code.


Solution

  • What you're trying to do is a 'hack' way by plotting multiple lines, but it's not ideal in ggplot terms. To do it successfully, I'd use aes_string. But it's a hack.

    df <- data.frame(Time = 1:20,
                     Var1 = rnorm(20),
                     Var2 = rnorm(20, mean = 0.5),
                     Var3 = rnorm(20, mean = 0.8))
    
    vars <- paste0("Var", 1:3)
    col_vec <- RColorBrewer::brewer.pal(3, "Accent")
    
    library(ggplot2)
    p <- ggplot(df, aes(Time))
    for (i in 1:length(vars)) {
        p <- p + geom_line(aes_string(y = vars[i]), color = col_vec[i], lwd = 1)
    }
    p + labs(y = "value")
    

    enter image description here

    How to do it properly

    To make this plot more properly, you need to pivot the data first, so that each aesthetic (aes) is mapped to a variable in your data frame. That means we need a single variable to be color in our data frame. Hence, we pivot_longer and plot again:

    library(tidyr)
    df_melt <- pivot_longer(df, cols = Var1:Var3, names_to = "var")
    
    ggplot(df_melt, aes(Time, value, color = var)) +
        geom_line(lwd = 1) +
        scale_color_manual(values = col_vec)
    

    enter image description here