Search code examples
rtidyrreshape2

How to convert long format to wide format data over multiple variable (Column) and be stacked onto each other?


I have a monthly time series data (1987-2017) for 20 station. I want to convert the long format data to wide format data such that all the data covering 20 station are in one data frame.

head(Monthly_rainfall2[-1:-20,1:5]) #long format data

# A tibble: 6 x 5
  Year  Month  stn1    stn2   stn3
  <chr> <ord> <dbl>    <dbl>  <dbl>
1 1987  Jan      NA       NA    0  
2 1987  Feb      NA       NA   60.5
3 1987  Mar      NA       NA   66  
4 1987  Apr      NA       NA   64  
5 1987  May      NA       NA  183. 
6 1987  Jun      NA       NA  216

Note that the Month Column is a ordered factor.

 dput(Monthly_rainfall2[21:50,1:4])
 structure(list(Year = c("1987", "1987", "1987", "1987", "1987", 
"1987", "1987", "1987", "1987", "1987", "1987", "1987", "1988", 
"1988", "1988", "1988", "1988", "1988", "1988", "1988", "1988", 
"1988", "1988", "1988", "1989", "1989", "1989", "1989", "1989", 
"1989"), Month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Jan", "Feb", "Mar", 
"Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
), class = c("ordered", "factor")), stn1 = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), stn2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -30L))

I have tried the following code

library(tidyr)

wide_data <- spread(Monthly_rainfall2[1:3], Month, stn1 )

The above code provides what I want however, I am not able to produce for all station at once in a single dataframe

I want my data frame to appear like this, where stn2 starts immediately after stn1 and subsequently follows the same pattern for all the station

Year     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#stn1
1987     0.8 0.5 0.8 2   20   25  30  30  21  22 3    0
1988     1   1.2  1.8 2  20   22   25 21   15  12  10  9
...
2017     0.5  1  14  19   17   14  15  13  10  14  18  10
#stn2
1987     0.8 0.5 0.8 2   20   25  30  30  21  22 3    0
1988     1   1.2  1.8 2  20   22   25 21   15  12  10  9
...
2017     0.5  1  14  19   17   14  15  13  10  14  18  10
#stn3
1987     0.8 0.5 0.8 2   20   25  30  30  21  22 3    0
1988     1   1.2  1.8 2  20   22   25 21   15  12  10  9
...
2017     0.5  1  14  19   17   14  15  13  10  14  18  10

Solution

  • A solution using data.table

    DT <- as.data.table(df)
    
    #filling all ne NAs for optical rasons
    DT[, c("stn1", "stn2")  := .(sample(1:100, 30), sample(1:100, 30))]
    
    dcast(DT, Month ~ Year, value.var = c("stn1", "stn2")) %>%
      melt(id.vars = 1, variable.name = "year") %>%
      dcast(year ~ Month, value.var = "value") -> DT2
    
            year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
    1: stn1_1987  98   5  89  84  75  36   4  34  26  78  33  28
    2: stn1_1988  67  74  40  63   9  19  79  61  66  93  47  62
    3: stn1_1989  68  29   7  46  54  87  NA  NA  NA  NA  NA  NA
    4: stn2_1987  31  61  74  89  46  54  70  80  84   6  96  32
    5: stn2_1988  75  71  11  99  20   7  77  13  52  14   2  41
    6: stn2_1989  83  22  97  43  59  15  NA  NA  NA  NA  NA  NA
    
    

    If you want yearand stnseperated you can do:

    DT2[, c("stn", "year") := tstrsplit(year, "_", fixed=TRUE)]
    
       year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec  stn
    1: 1987  98   5  89  84  75  36   4  34  26  78  33  28 stn1
    2: 1988  67  74  40  63   9  19  79  61  66  93  47  62 stn1
    3: 1989  68  29   7  46  54  87  NA  NA  NA  NA  NA  NA stn1
    4: 1987  31  61  74  89  46  54  70  80  84   6  96  32 stn2
    5: 1988  75  71  11  99  20   7  77  13  52  14   2  41 stn2
    6: 1989  83  22  97  43  59  15  NA  NA  NA  NA  NA  NA stn2