Search code examples
rplotggplot2ggvis

Looping ggplot/ggvis over identical columns and rows in two datasets


I am fairly new to R and therefore have to bother you with a basic question.

I have two large panel datasets (60 variables, each for 30 countries, ranging over a period from 1950-2013). The 60 variables have identical names, the data may or may not differ.

My final goal is to create 60 grids with 30 plots each: each grid refers to one of the 60 variables and contains a plot for each country. Each plot will contain 2 line graphs, one from the first data frame and one of the second (for the same variable each).

I have done this in Stata before, using global vars and a simple loop. I am stuck in trying to make this work in R.

I cast the data into wide format for now (columns: Date, Country, Indicator1,...Indicator60), but have read that ggplot2 does better with long formats(?).

My main issue is how to loop at all (for, lapply, function..). .

If not an answer, I would hugely appreciate ideas or hints at how to approach this problem, so that I would manage to ask more specific questions, if needed.

Edit: below a reproducible sample of the data, as requested

year <- c(2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013,2010, 2011, 2012,     
    2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012,    
    2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013)
country <- c(rep("Australia", times =8), rep("Canada", times = 8),  
    rep("Australia", times =8), rep("Canada", times = 8))
indicator <- c(rep("Apples", times = 16), rep("Bananas", times = 16))
versiondata <- c(rep("new", times = 4), rep("old", times = 4), rep("new",  
    times = 4), rep("old", times = 4), rep("new", times = 4), rep("old", 
    times = 4), rep("new", times = 4), rep("old", times = 4))
value <- runif(32)
mydf <- data.frame(year, country, indicator, versiondata, value)  

I am still stuck at the exact expression of the do. I came up with this sorry bit, where I do not know how to specify the two y-variables (corresponding to old and new from the column versiondata).

mydf %>%
  group_by(indicator) %>%
  do({
    p <- ggplot(., aes(x=year)) + 
      geom_line(aes(y = ???)) 
    + facet_wrap(~country) + ggtitle("indicator")
    })

Solution

  • A fairly standard approach for this kind of thing would be:

    by(mydf, mydf$indicator, function(X) ggplot(X, aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country))
    

    Using the indicator name as a title can be take a little more finesse:

    lapply(unique(mydf$indicator), function(X) ggplot(mydf[mydf$indicator == X,], aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country) + labs(title = X))
    

    Should look like this for each indicator:

    enter image description here