Search code examples
rdplyrtidyrreformat

Reformat data frame to long format keeping grouped columns


Consider the following data frame:

set.seed(42)
ID <- c(1:6) 
OB <- c(rep("A",4),rep("B",2))
lat_start <- rnorm(6,42,2)
lon_start <- rnorm(6,12,2)
lat_stopp <- rnorm(6,42,2)
lon_stopp <- rnorm(6,12,2)
df <- data.frame(ID,OB,lat_start,lon_start,lat_stopp,lon_stopp)

I would like to reformat df to a long format, where each ID has a row for the start and stop coordinates. A simple gather() solution, e.g. df_wrong <- gather(df,coords,val,lat_start:lon_stopp) will obviously not work, as I need the lat/lon columns to stay grouped. I want the long data frame to look something like this:

   ID OB    SS      lat       lon
1   1  A start 44.74192 15.023040
2   1  A  stop 39.22228  7.119066
3   2  A start 40.87060 11.810680
4   2  A  stop 41.44242 14.640227
5   3  A start 42.72626 16.036850
6   3  A  stop 41.73336 11.386723
7   4  A start 43.26573 11.874570
8   4  A  stop 43.27190  8.437383
9   5  B start 42.80854 14.609740
10  5  B  stop 41.43149 11.656165
11  6  B start 41.78775 16.573290
12  6  B  stop 36.68709 14.429349

The SS column can of course be added later. Any suggestions will be much appreciated!


Solution

  • One tidyverse possibility could be:

    df %>%
     gather(var, val, -c(ID, OB)) %>%
     separate(var, c("var1", "SS")) %>%
     spread(var1, val)
    
       ID OB    SS      lat       lon
    1   1  A start 44.74192 15.023044
    2   1  A stopp 39.22228  7.119066
    3   2  A start 40.87060 11.810682
    4   2  A stopp 41.44242 14.640227
    5   3  A start 42.72626 16.036847
    6   3  A stopp 41.73336 11.386723
    7   4  A start 43.26573 11.874572
    8   4  A stopp 43.27190  8.437383
    9   5  B start 42.80854 14.609739
    10  5  B stopp 41.43149 11.656165
    11  6  B start 41.78775 16.573291
    12  6  B stopp 36.68709 14.429349