Search code examples
rplotggplot2drc

ggplot2 only plotting last variable from grouped dataframe


I am trying to plot 3 models with group=model, but only the last model in the data.frame is being plotted. I've done a fair amount of searching online but haven't been able to figure out what the problem is.

The data.frame has 3 columns: dose(x axis), p1(y axis), and model(mod1,mod2,mod3)

library(ggplot2)

ggplot(df, aes(x=dose,y=p1, group=model))+
   geom_line(aes(color=model))+
   geom_step(aes(x=dose,y=p1,group=model, col=model))+
   coord_trans(x = 'log10', limx = c(0.01,200), limy=c(0.01,0.05)) +

#Assigning colors to mod1,mod2,mod3
   values = c("mod1" = "darkgreen", "mod2" = "blue", "mod3" = "red"))

#Log Aesthetics, cut for brevity of code, Help with log ticks [enter image description here]

-It isn't an issue of mod1 and mod2 being plotted outside of the frame.

If it is of any relevance: The data I have included below (which I created the graphic above with) only includes a few points from the lower bound of 3 models - The actual data set contains 8 models, each with 100 points.

df <- structure(list(dose = c(0.5, 0.565608284362011, 0.639825462677876, 
  0.723781164472726, 0.818753245381915, 0.5, 0.565608284362011, 
  0.639825462677876, 0.723781164472726, 0.818753245381915, 0.926187236872587, 
  0.5, 0.565608284362011, 0.639825462677876, 0.723781164472726, 
  0.818753245381915, 0.926187236872587, 1.04771834809099, 1.18519635471669, 
  1.34071375364684, 1.51663761204148, 1.71564559549136), p1 = c(0.0103075076812739, 
  0.0116952370538794, 0.0132672958208565, 0.0150474513444454, 0.017062331184752, 
  0.0103075076812739, 0.0116952370538794, 0.0132672958208565, 0.0150474513444454, 
  0.017062331184752, 0.0193417088954045, 0.0103075076812739, 0.0116952370538794, 
  0.0132672958208565, 0.0150474513444454, 0.017062331184752, 0.0193417088954045, 
  0.0219188012260592, 0.0248305714535104, 0.0281180313564242, 0.0318265316115288, 
  0.036006027058739), model = c("mod1", "mod1", "mod1", "mod1", 
  "mod1", "mod2", "mod2", "mod2", "mod2", "mod2", "mod2", "mod3", 
  "mod3", "mod3", "mod3", "mod3", "mod3", "mod3", "mod3", "mod3", 
  "mod3", "mod3")), .Names = c("dose", "p1", "model"), row.names = c(1L, 
  2L, 3L, 4L, 5L, 101L, 102L, 103L, 104L, 105L, 106L, 201L, 202L, 
  203L, 204L, 205L, 206L, 207L, 208L, 209L, 210L, 211L), class = "data.frame")

Solution

  • From what I understand the problem is that the points are over each other, meaning that you can only see the more complete line (mod3) and the other ones are underneath.

    White some basic data wrangling we can get rid of the duplicates and get something like this

    library(ggplot2)
    require(tidyr)
    require(dplyr)
    
     df1 <- df %>% spread(model,p1) %>% 
      mutate(mod3=ifelse(is.na(mod2*mod3)==FALSE,NA,mod3),mod2=ifelse(is.na(mod2*mod1)==FALSE,NA,mod2)) 
    %>% gather(model,p1,mod1:mod3) %>% filter(is.na(p1)==FALSE) %>% arrange(p1) 
    
    df1 <- rbind(df1,df1[which(df1$p1==max(df1[df1$model=="mod1",]$p1)),])
    df1 <- rbind(df1,df1[which(df1$p1==min(df1[df1$model=="mod3",]$p1)),])
    df1[nrow(df1),2] <- "mod2"
    df1[nrow(df1)-1,2] <- "mod2"
    
    
    ggplot(df1, aes(x=dose,y=p1, group=model))+
    geom_line(aes(color=model))+
    geom_step(aes(x=dose,y=p1,group=model, col=model))+
    coord_trans(x = 'log10', limx = c(0.01,200), limy=c(0.01,0.05))
    

    Outputs the following image:

    enter image description here