I have a monthly time series data (1987-2017) for 20 station. I want to convert the long format data to wide format data such that all the data covering 20 station are in one data frame.
head(Monthly_rainfall2[-1:-20,1:5]) #long format data
# A tibble: 6 x 5
Year Month stn1 stn2 stn3
<chr> <ord> <dbl> <dbl> <dbl>
1 1987 Jan NA NA 0
2 1987 Feb NA NA 60.5
3 1987 Mar NA NA 66
4 1987 Apr NA NA 64
5 1987 May NA NA 183.
6 1987 Jun NA NA 216
Note that the Month Column is a ordered factor.
dput(Monthly_rainfall2[21:50,1:4])
structure(list(Year = c("1987", "1987", "1987", "1987", "1987",
"1987", "1987", "1987", "1987", "1987", "1987", "1987", "1988",
"1988", "1988", "1988", "1988", "1988", "1988", "1988", "1988",
"1988", "1988", "1988", "1989", "1989", "1989", "1989", "1989",
"1989"), Month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Jan", "Feb", "Mar",
"Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
), class = c("ordered", "factor")), stn1 = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), stn2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
I have tried the following code
library(tidyr)
wide_data <- spread(Monthly_rainfall2[1:3], Month, stn1 )
The above code provides what I want however, I am not able to produce for all station at once in a single dataframe
I want my data frame to appear like this, where stn2 starts immediately after stn1 and subsequently follows the same pattern for all the station
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#stn1
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn2
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn3
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
A solution using data.table
DT <- as.data.table(df)
#filling all ne NAs for optical rasons
DT[, c("stn1", "stn2") := .(sample(1:100, 30), sample(1:100, 30))]
dcast(DT, Month ~ Year, value.var = c("stn1", "stn2")) %>%
melt(id.vars = 1, variable.name = "year") %>%
dcast(year ~ Month, value.var = "value") -> DT2
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1: stn1_1987 98 5 89 84 75 36 4 34 26 78 33 28
2: stn1_1988 67 74 40 63 9 19 79 61 66 93 47 62
3: stn1_1989 68 29 7 46 54 87 NA NA NA NA NA NA
4: stn2_1987 31 61 74 89 46 54 70 80 84 6 96 32
5: stn2_1988 75 71 11 99 20 7 77 13 52 14 2 41
6: stn2_1989 83 22 97 43 59 15 NA NA NA NA NA NA
If you want year
and stn
seperated you can do:
DT2[, c("stn", "year") := tstrsplit(year, "_", fixed=TRUE)]
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec stn
1: 1987 98 5 89 84 75 36 4 34 26 78 33 28 stn1
2: 1988 67 74 40 63 9 19 79 61 66 93 47 62 stn1
3: 1989 68 29 7 46 54 87 NA NA NA NA NA NA stn1
4: 1987 31 61 74 89 46 54 70 80 84 6 96 32 stn2
5: 1988 75 71 11 99 20 7 77 13 52 14 2 41 stn2
6: 1989 83 22 97 43 59 15 NA NA NA NA NA NA stn2