Search code examples
rgraphggplot2lattice

How to construct a single graph for two completely different variables in terms of scale?


I have this data set

data of Staff strength and total Applications received

df <- data.frame(year = seq(1970, 2015, by = 5),
                 staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
                 applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))

I want to perform the exploratory analysis and want to compare whether the staff strength is growing according to the applications received. I plotted a line graph using excel : enter image description here

which isn't very meaningful. I've also taken the log of both variables which almost got the desired result but i wonder if the graphs with log are less explainable to non-mathematicians. Since i want to use these kind of graphs in a presentation to my managerial staff who don't know much of statistics or mathematics. My question is how to tackle this situation in order to draw a meaningful graph. I've a gut feeling that R might have a better solution(that is why i asked here ) than Excel but the problem is 'How'?

Any help will be highly appreciated.


Solution

  • One recommendation would be to change your measure into some type of ratio metric. For example, staff per applications. In the following, I will use staff per 1,000 applications:

    library(ggplot2)
    
    df <- data.frame(year = seq(1970, 2015, by = 5),
                     staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
                     applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))
    
    ggplot(data = df, aes(x = year, y = staff / (applications / 1000))) +
      geom_point(size = 3) +
      geom_line() +
      ggtitle("Staff per 1,000 Applications")
    

    Plot 01

    We can achieve the same result without ggplot2 with:

    with(df, 
          plot(x = year, y = staff / (applications / 1000), type = "l", main = "Staff per 1,000 Applications") + 
            points(x = year, y = staff / (applications / 1000), pch = 21, cex = 2, bg = "black")
         )
    

    Base R Plot


    Alternatively, you could make your dataset a little more tidy (see this, this, and/or this for more information) and plot them two facets with free_y scales:

    library(tidyr)
    
    df_tidy <- gather(df, measure, value, -year)
    
    ggplot(data = df_tidy, aes(x = year, y = value)) +
      geom_point(size = 3) +
      geom_line() +
      facet_grid(measure ~ ., scales = "free_y")
    

    Plot 02