Search code examples
rggplot2smoothing

How to add a second smooth line in a second y-axis and scale it with regard the first y-axis with ggplot2?


I have a dataframe df that summarizes the activity Activity of a fish species and the current's intensity C.I and current's direction C.D associated to this activity in the water column. As an example of dataframe:

df<- data.frame(C.D=c(5,5,5,10,10,10,20,20,20,40,40,40,80,80,80,100,100,100,130,130,130,160,160,160,190,190,190,220,220,220,250,250,250,280,280,280,310,310,310,340,340,340,359,359,359),
                Activity=c(1.1,1.6,0.6,1.2,1.8,1.3,1.3,1.4,1.88,0.99,1.8,2.1,1.75,1.5,2.4,1.55,0.9,2.4,1.4,1.5,3.2,1.7,2.1,3.8,2.8,3.9,2.1,3.4,2.6,4.1,2.3,3.6,4.3,3.0,2.4,1.8,2.5,1.6,1.1,0.5,1.4,2.3,0.8,2.1,1.5),
                C.I=c(0.05,0.21,0.11,0.2,0.15,0.28,0.24,0.18,0.33,0.11,0.22,0.13,0.16,0.31,0.23,0.15,0.28,0.36,0.25,0.31,0.58,0.42,0.36,0.52,0.58,0.82,0.71,0.64,0.51,0.4,0.54,0.55,0.68,0.32,0.21,0.23,0.37,0.22,0.15,0.21,0.24,0.18,0.04,0.6,0.12))

df

   C.D Activity  C.I
1    5     1.10 0.05
2    5     1.60 0.21
3    5     0.60 0.11
4   10     1.20 0.20
.    .       .    .
.    .       .    .
.    .       .    . 

I want to explore if the current's direction C.D affects the activity of my fish species. For instance, if the activity is higher with some C.D than with others. However, since C.D and C.I might be very related (for some C.D the intensity of the currents C.I might be higher than for others), I need to add in my plot info about C.I to interpret if what I see is due to the effect of the variable C.D or is due to the third variable C.I.

As a first approximation, I plotted the points of the relationship between C.D and Activity and I added a smooth line to see the general trend. I also coloured the points depending on the C.I to see if the colours follow some pattern (for instance if specific colours are concentrated in specific C.D which would mean that some C.I only occur with specific C.D). In the example, high C.I ara associated with C.D between 140 and 250 grades. The code and the image are below:

P<- ggplot(df, aes(C.D, Activity)) +
  geom_point(aes(C.D, Activity, color = C.I)) + scale_colour_gradientn(colours=c("green","black")) + theme_bw()
P<- P +  geom_smooth()  +
  ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
  theme(plot.title = element_text(hjust = 0.5)) 

enter image description here

My problem arises when I have to plot thousands of points, since then, the use of colours for the points to show any C.I pattern associated with C.D is not appropriate. Here I show a real plot of my data:

enter image description here

My question is how could I add a second smooth line scaled with regard the first y-axis that shows the relationship between C.D and C.I. I've got this so far:

P<- P + geom_smooth(aes(C.D, C.I), color="red", se=FALSE)
P

enter image description here

Is it possible to scale the 2nd y-axis to improve the interpretation?.


Solution

  • First, I'd like to point out the usual warnings that go with secondary axis expressed in this answer elsewhere.

    Is simply transforming your data and inversely transforming the secondary axis not appropriate?

    Note that 6 is an arbitrary number for the transformations in order to make the data look reasonable.

    ggplot(df, aes(C.D, Activity)) +
      geom_point(aes(C.D, Activity, color = C.I)) + 
      scale_colour_gradientn(colours=c("green","black")) + 
      theme_bw() + 
      geom_smooth()  +
      ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
      theme(plot.title = element_text(hjust = 0.5)) +
      geom_smooth(aes(C.D, C.I * 6), se=FALSE, colour = "red", show.legend = TRUE) +
      scale_y_continuous(sec.axis = sec_axis(trans = ~ . / 6, name = "CI"))
    

    enter image description here

    EDIT: For proper legends for the lines, I'm afraid you would have to do a bit of manual specification (unless someone else has a more elegant solution):

    ggplot(df, aes(C.D, Activity)) +
      geom_point(aes(C.D, Activity, color = C.I)) + 
      scale_colour_gradientn(colours=c("green","black")) + 
      theme_bw() + 
      geom_smooth(aes(linetype = "Activity"))  +
      ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
      theme(plot.title = element_text(hjust = 0.5)) +
      geom_smooth(aes(C.D, C.I * 6, linetype = "C.I."), se=FALSE, colour = "red", show.legend = TRUE) +
      scale_y_continuous(sec.axis = sec_axis(trans = ~ . / 6, name = "CI")) +
      scale_linetype_manual(
        values = c(1,1), 
        guide = guide_legend(override.aes = list(colour = c("blue", "red")))
      )
    

    enter image description here