Search code examples
rggplot2line-plot

ggplot - Set colour of lines depending on a variable with a changing presence of the variable type within the data


Assume the following data frame:

mydf <- data.frame(date = as.Date(rep(c('2019-11-01', '2019-10-01'), 2)), 
                    value = c(10, 15, 8, 4),
                    type = c('Type 1', 'Type 1', 'Type 2', 'Type 2'))

print(mydf)
        date value   type
1 2019-11-01    10 Type 1
2 2019-10-01    15 Type 1
3 2019-11-01     8 Type 2
4 2019-10-01     4 Type 2

I want to create an automated code which creates a line plot for each type and define the colours of each line. Generally, I know how to do that:

require(ggplot2)
myplot <- ggplot(mydf, aes(x = date, y = value, colour = type)) + geom_line() +
  scale_color_manual(name = 'Type', values=c('blue', 'red'))

However, the data frame might be changing when running the code in another month. There might be a Type 3 within the data frame:

mydf <- data.frame(date = as.Date(rep(c('2019-11-01', '2019-10-01'), 3)), 
                    value = c(10, 15, 8, 4, 12, 8),
                    type = c('Type 1', 'Type 1', 'Type 2', 'Type 2', 'Type 3', 'Type 3'))

print(mydf)
     date    value  type
1 2019-11-01    10 Type 1
2 2019-10-01    15 Type 1
3 2019-11-01     8 Type 2
4 2019-10-01     4 Type 2
5 2019-11-01    12 Type 3
6 2019-10-01     8 Type 3

And in yet another month Type 1 or Type 2 might not be in the data:

mydf <- data.frame(date = as.Date(rep(c('2019-11-01', '2019-10-01'), 2)), 
                    value = c(10, 15, 8, 4),
                    type = c('Type 1', 'Type 1', 'Type 3', 'Type 3'))

print(mydf)
        date value   type
1 2019-11-01    10 Type 1
2 2019-10-01    15 Type 1
3 2019-11-01     8 Type 3
4 2019-10-01     4 Type 3

How can I set the colours for Type 1, Type 2 and Type 3 and then variably use the respective defined colours depending on which Type is present in the data. So I can pre-define the colours and just run the script with the new data without needing to change anything within my code (assume Type 1 should be blue, Type 2 should be red and Type 3 should be black for each plot of the three data frames). Thanks!


Solution

  • The values parameter can take a named vector to assign values to respective Type.

    library(ggplot2)
    
    cols <- c('Type 1' = 'blue', 'Type 2' = 'red', 'Type 3' = 'black')
    
    ggplot(mydf, aes(x = date, y = value, colour = type)) + geom_line() +
      scale_color_manual(name = 'Type',values= cols)
    

    so when you have data with all types present, it looks

    mydf <- data.frame(date = as.Date(rep(c('2019-11-01', '2019-10-01'), 3)), 
                 value = c(10, 15, 8, 4, 12, 8),
                 type = c('Type 1', 'Type 1', 'Type 2', 'Type 2', 'Type 3', 'Type 3'))
    

    enter image description here

    and when you have some types absent, it still uses the same colors with same code.

    mydf <- data.frame(date = as.Date(rep(c('2019-11-01', '2019-10-01'), 2)), 
                    value = c(10, 15, 8, 4),
                    type = c('Type 1', 'Type 1', 'Type 3', 'Type 3'))
    

    enter image description here