Search code examples
rggplot2visibilityfacetsubplot

ggplot with a lot of groups; subplots (facets) for better arrangement


I have a data frame with values for gross profits on assets, 10 Industry classes and e timespan from 1970 to 2015 in the long format. I want to plot each of these time series and the (overall) mean. But the problem is, that the plot gets pretty chaotic. Therefore I thought to split it up into two or three subplots. I am using ggplot and managed to plot the time series, but I cant figure out how to do subplots in the right fashion.

df <- melt(sic_j[1:11], id.vars='time', variable.name='Industry')
> head(df, 20)
   time                       Industry     value
1  1970 Agriculture, Forestry, Fishing 0.4450458
2  1971 Agriculture, Forestry, Fishing 0.3834808
3  1972 Agriculture, Forestry, Fishing 0.3970010
4  1973 Agriculture, Forestry, Fishing 0.3993006
5  1974 Agriculture, Forestry, Fishing 0.3960956
6  1975 Agriculture, Forestry, Fishing 0.4052760
7  1976 Agriculture, Forestry, Fishing 0.3856735
8  1977 Agriculture, Forestry, Fishing 0.4062286
9  1978 Agriculture, Forestry, Fishing 0.3631151
10 1979 Agriculture, Forestry, Fishing 0.3987136
11 1980 Agriculture, Forestry, Fishing 0.3926147
12 1981 Agriculture, Forestry, Fishing 0.3207508
13 1982 Agriculture, Forestry, Fishing 0.3638654
14 1983 Agriculture, Forestry, Fishing 0.2901777
15 1984 Agriculture, Forestry, Fishing 0.3329089
16 1985 Agriculture, Forestry, Fishing 0.3384187
17 1986 Agriculture, Forestry, Fishing 0.3142270
18 1987 Agriculture, Forestry, Fishing 0.3610059
19 1988 Agriculture, Forestry, Fishing 0.2502937
20 1989 Agriculture, Forestry, Fishing 0.3156292

ggplot(df, aes(x=time, y=value))+
  geom_line(aes(group=Industry, color=Industry))+
  stat_summary(fun.y=mean, na.rm=T, group=11, alpha=1, color='red', size=1.5, geom='line')+
  theme_bw()+
  labs(x='year', y='gross profits on assets',
    color=NULL)+theme(legend.position = 'bottom')

enter image description here

I tried the folowing with facet_grid:

ggplot(df, aes(x=time, y=value))+
  geom_line(aes(group=Industry, color=Industry))+
  stat_summary(fun.y=mean, na.rm=T, group=11, alpha=1, color='red', size=1.5, geom='line')+
  theme_bw()+
  labs(x='year', y='gross profits on assets',
    color=NULL)+theme(legend.position = 'bottom')+facet_grid(Industry~.)

All I manage to get is the following, which obviously is useless: enter image description here

I tried to split up the groups in order to have 3-4 industries per subplot, but I got this error:

Error in combine_vars(data, params$plot_env, vars, drop = params$drop) : 
  At least one layer must contain all variables used for facetting

In the end I would like to have a well-arranged plot of these 11 time series (10 industries and the mean). Since I already tried it with diffrent colors, linetypes and points, I think the best way are some subplots, but maybe someone has a better idea...?


Solution

  • Consider that we have a data input as follow:

    time <- 1970:2011
    industry <- letters[1:10]
    
    dat <- expand.grid(time=time, industry=industry)
    dat$value <- rnorm(nrow(dat))
    

    The ggplot of this data would be similarly confusing as in the question:

    ggplot(dat, aes(time, value, colour=industry)) + 
        geom_line()
    

    enter image description here

    One of the way to force a few plots into a single facet is by creating a new group. In this case, I'm grouping the first three listed industry as group_one, the next three as group_two and the remaining as group_three

    library(tidyverse)
    dat2 <- dat %>% 
       mutate(group_one = ifelse(industry %in% letters[1:3], value, NA),
               group_two = ifelse(industry %in% letters[4:6], value, NA),
               group_three = ifelse(industry %in% letters[7:10], value, NA)) %>%
       gather(variable, new_val, group_one:group_three)
    

    The new plot with facet would now look slightly neater:

    ggplot(dat2, aes(time, new_val, colour=industry)) + geom_line() + 
        facet_wrap(~variable, ncol=1)
    

    enter image description here

    Edit:

    Overlaying additional line across all facet can be done with annotate function.

    First, generate the summary table with mean value for each time point:

    dat3 <- dat %>% 
        group_by(time) %>% 
        summarise(mean.value=mean(value))
    

    Adding the annotate to the ggplot above:

    ggplot(dat2, aes(time, new_val, colour=industry)) + 
      geom_line() + 
      facet_wrap(~variable, ncol=1) + 
      annotate(geom="line", x=dat3$time, y=dat3$mean.value, 
               color='red', size=1.5)
    

    enter image description here

    note that the additional table look at a little different due to the different seed used between the plots