Search code examples
rggplot2line-plot

Multiple lines on a line plot in R


I'm trying to create a line plot in R, showing lines for different places over time.

My data is in a table with Year in the first column, the places England, Scotland, Wales, NI as separate columns:

     Year   England Scotland Wales  NI
1  2006/07      NA     411   188   111
2  2007/08      NA     415   193   112
3  2008/09      NA     424   194   114
4  2009/10      NA     429   194   115
5  2010/11      NA     428   199   116
6  2011/12      NA     428   200   116
7  2012/13      NA     425   199   117
8  2013/14      NA     427   202   117
9  2014/15      NA     431   200   121
10 2015/16   3556      432   199   126
11 2016/17   3436      431   200   129
12 2017/18   3467      NA    NA    NA

I'm using ggplot, and can get a lineplot for any of the places, but I'm having difficulty getting lines for all the places on the same plot.

It seems like this might work if I had the places in a column as well (instead of across the top), as then I could set y in the code below to be that column, as opposed to the column that is a specific place. But that seems a bit convoluted and as I have lots of data in the existing format, I'm hoping there's either a way to do this with the format I have or a quick way of transforming it.

ggplot(data=mysheets$sheet1, aes(x=Year, y=England, group=1)) +
  geom_line()+
  geom_point()

From what I can tell, I'll need to reshape my data (into long form?) but I haven't found a way to do that where I don't have a column for places (i.e., I have a column for each place but the table doesn't have a way of saying these are all places and the same kind of thing).

I've also tried transposing my data, so the places are down the side and the years are along the top, but R still has its own headers for the columns - I guess another option might be if it was possible to have the years as headers and have that recognised by R?


Solution

  • As you said, you have to convert to long format to make the most out of ggplot2.

    library(ggplot2)
    library(dplyr)
    
    mydata_raw <- read.table(
      text = "
      Year   England Scotland Wales  NI
      1  2006/07      NA     411   188   111
      2  2007/08      NA     415   193   112
      3  2008/09      NA     424   194   114
      4  2009/10      NA     429   194   115
      5  2010/11      NA     428   199   116
      6  2011/12      NA     428   200   116
      7  2012/13      NA     425   199   117
      8  2013/14      NA     427   202   117
      9  2014/15      NA     431   200   121
      10 2015/16   3556      432   199   126
      11 2016/17   3436      431   200   129
      12 2017/18   3467      NA    NA    NA"
    )
    
    # long format
    mydata <- mydata_raw %>% 
      tidyr::gather(country, value, England:NI) %>% 
      dplyr::mutate(Year = as.numeric(substring(Year, 1, 4))) # convert to numeric date
    
    ggplot(mydata, aes(x = Year, y = value, color = country)) + 
      geom_line() +
      geom_point()
    

    enter image description here