Search code examples
rggplot2data-visualizationpsychtoolbox

Best Approach to manipulate level colors in a scatterplot - ggplot2 (layering plots and/or assigning colors to specific row values/or something else?)


Here is a sample of my data frame:

HRdetails <- data.frame(avgHR = c(96,115,130,120,112,87,112,114,116,140),
                   DetailCount = c(5,18,4,3,9,2,10,11,19,15),
                   PID = c(1,1,1,1,1,2,2,2,2,2), 
                   Section = c("lab","s1","s2","s3","s4","lab","s1","s2","s3","s4"))
data$Section<-as.factor(data$Section)
data$PID<-as.factor(data$PID)
data$DetailCount<-as.integer(data$DetailCount)

I am plotting a scatterplot with individual prediction lines for each participant and an overall regression line for the sample.

Here is a picture of my plot:

enter image description here

This is the code I am using to plot the above figure:

ggplot(data = data, aes(x = avgHR, y = DetailCount)) +
   geom_point(aes(colour=factor(PID)), method="lm", alpha=.75,size = 2) +
   scale_color_manual(values=coul) +
   geom_smooth(aes(colour=factor(PID)), method="lm", alpha=.75,size = .5, se=F) +
   geom_smooth(method="lm", color="black") +
   theme(legend.position="none")

the variable coul is my custom palette:

coul<-c("#666699","#CCCCFF","#996699","#FFFFCC","#99CC99","#336666","#006699","#000066")
coul<-colorRampPalette(coul)(50)
pie(rep(1, length(coul)), col = coul, main="")

As you can see in the df, one of the levels in the "Section" factor is labeled "lab". I want my figure to have all points for the other "Sections" to be colored from the coul palette I made, but I want all of the "lab" sections to be colored from this lab palette:

lab<-brewer.pal(n=9, name="YlOrRd")
lab<-colorRampPalette(lab)(45)
pie(rep(1, length(lab)), col=lab, main="")

If I were plotting the lab points alone, this is what the scatterplot looks like: enter image description here

Here is the code for that scatterplot:

ggplot(data=labstats, aes(x=avgHR, y=DetailCount)) +
  geom_point(aes(colour=factor(PID))) +
  scale_color_manual(values=lab) +
  theme(legend.position = "none")

And here is a picture of the lab only Df I am using for that:

enter image description here

What is the best approach to making a plot that shows all the lab points in my lab palette and the rest of the points in the coul palette?

I have been looking into stacking the plots on top of each other because that theoretically seems easiest, but I am having a very difficult time figuring out how to combine these without error and using two different dfs. I also considered assigning the lab palette to specific row values within the original model visualization, but that seems difficult/I don't think ggplot is equipped to do that. Please correct me if I am wrong and/or help me figure this out. ALSO, I am using palettes because each participant needs to have their own discrete color values for each datapoint. Thanks in advance.


Solution

  • Since you want a unique color per Section and PID, you can define an interaction. To get the correct color per interaction, you can create a named vector, and then provide that named vector to scale_color_manual.

    library(ggplot2)
    library(RColorBrewer)
    library(scales) # only used to display color palettes for debugging purposes
    
    data <- data.frame(avgHR = c(96,115,130,120,112,87,112,114,116,140),
                            DetailCount = c(5,18,4,3,9,2,10,11,19,15),
                            PID = c(1,1,1,1,1,2,2,2,2,2), 
                            Section = c("lab","s1","s2","s3","s4","lab","s1","s2","s3","s4"))
    data$Section<-as.factor(data$Section)
    data$PID<-as.factor(data$PID)
    data$DetailCount<-as.integer(data$DetailCount)
    
    # generate palettes per group
    num_lab <- sum(data$Section == 'lab')
    num_non_lab <- sum(data$Section != 'lab')
    
    coul<-c("#666699","#CCCCFF","#996699","#FFFFCC","#99CC99","#336666","#006699","#000066")
    coul<-colorRampPalette(coul)(sum(data$Section != 'lab'))
    
    lab<-brewer.pal(n=9, name="YlOrRd")
    lab<-colorRampPalette(lab)(sum(data$Section == 'lab'))
    
    # display the color palettes
    show_col(coul)
    show_col(lab)
    
    # create the combined palette, where participants from each group get assigned a color from
    # the palette for their group
    color_palette <- array(NA, nrow(data))
    color_palette[data$Section == 'lab'] <- lab
    color_palette[data$Section != 'lab'] <- coul
    names(color_palette) <- interaction(data$PID, data$Section)
    
    # plot the data
    ggplot(data <- data, aes(x = avgHR, y = DetailCount, color = interaction(PID, Section))) +
      geom_point() + scale_color_manual(values=color_palette)
    

    enter image description here

    I also noticed during plotting that the palettes for lab and non-lab share a color, (hex #FFFFCC). You should likely choose different color palettes that won't overlap. This is why that color appears for dots in both lab and non-lab.

    'Coul' Palette:

    enter image description here

    'Lab' Palette:

    enter image description here