I am fairly new to R and therefore have to bother you with a basic question.
I have two large panel datasets (60 variables, each for 30 countries, ranging over a period from 1950-2013). The 60 variables have identical names, the data may or may not differ.
My final goal is to create 60 grids with 30 plots each: each grid refers to one of the 60 variables and contains a plot for each country. Each plot will contain 2 line graphs, one from the first data frame and one of the second (for the same variable each).
I have done this in Stata before, using global vars and a simple loop. I am stuck in trying to make this work in R.
I cast the data into wide format for now (columns: Date, Country, Indicator1,...Indicator60), but have read that ggplot2 does better with long formats(?).
My main issue is how to loop at all (for, lapply, function..). .
If not an answer, I would hugely appreciate ideas or hints at how to approach this problem, so that I would manage to ask more specific questions, if needed.
Edit: below a reproducible sample of the data, as requested
year <- c(2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013,2010, 2011, 2012,
2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012,
2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013)
country <- c(rep("Australia", times =8), rep("Canada", times = 8),
rep("Australia", times =8), rep("Canada", times = 8))
indicator <- c(rep("Apples", times = 16), rep("Bananas", times = 16))
versiondata <- c(rep("new", times = 4), rep("old", times = 4), rep("new",
times = 4), rep("old", times = 4), rep("new", times = 4), rep("old",
times = 4), rep("new", times = 4), rep("old", times = 4))
value <- runif(32)
mydf <- data.frame(year, country, indicator, versiondata, value)
I am still stuck at the exact expression of the do. I came up with this sorry bit, where I do not know how to specify the two y-variables (corresponding to old and new from the column versiondata).
mydf %>%
group_by(indicator) %>%
do({
p <- ggplot(., aes(x=year)) +
geom_line(aes(y = ???))
+ facet_wrap(~country) + ggtitle("indicator")
})
A fairly standard approach for this kind of thing would be:
by(mydf, mydf$indicator, function(X) ggplot(X, aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country))
Using the indicator name as a title can be take a little more finesse:
lapply(unique(mydf$indicator), function(X) ggplot(mydf[mydf$indicator == X,], aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country) + labs(title = X))
Should look like this for each indicator: