I have data that look like the following:
>loan data
ID loan_start_date loan_maturity_date feb13 march13 april13........
1 2016-01-03 2017-01-03 46 45 44
1 2011-01-08 2013-01-08 NA NA NA
1 2013-02-13 2015-02-13 23 22 21
2 2012-02-03 2016-05-03 38 37 36
2 2013-05-08 2014-01-09 10 09 08
2 2011-03-13 2013-02-18 0 NA NA
3 2015-07-03 2016-01-08 34 33 32
3 2013-01-09 2015-07-08 28 27 26
Can I create panel data from this? If yes, how can I do that in R? In the panel data, the cross-section dimension is ID and time dimension is feb13,march13,april13 (time to loan maturity date from that specific month).....goes on for 48 months. I have looked at other examples of creating panel data but in those examples, each ID just took one row, now each ID is sitting in multiple rows. So I am confused how to create panel out of this. Would really appreciate your help.
Edit: The expected outcome should look like this if I am correct:
>loan data
ID months time to maturity
1 feb13 46
1 march13 45
1 april13 44
.
.
.
1 jan17 0
1 feb13 NA
1 march13 NA
1 april13 NA
.
.
.
1 jan17 NA
1 feb13 23
1 march13 22
1 april13 21
.
.
.
1 jan17 NA
2 feb13 38
2 march13 37
2 april13 36
.
.
.
2 jan17 NA
2 feb13 10
2 march13 09
2 april13 08
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
3 feb13 34
3 march13 33
3 april13 32
.
.
.
3 jan17 NA
3 feb13 28
3 march13 27
3 april13 26
.
.
.
3 jan17 NA
Another option is to use the gather()
function from the tidyr
package.
This function turns multiple columns into a single key-value pair. You need to specify the data frame, the name of the new "key" column, the name of the new "value" column, and then specify which columns to gather. If (like in this case) there are more columns to include than exclude, you can simply specify which columns to exclude instead:
library(tidyr)
gather(data, key = "month_year", value = "months_to_maturity", -(ID:loan_maturity_date))
#> ID start_date loan_maturity_date month_year months_to_maturity
#> 1 1 2016-01-03 2017-01-03 feb13 46
#> 2 1 2011-01-08 2013-01-08 feb13 NA
#> 3 1 2013-02-13 2015-02-13 feb13 23
#> 4 2 2012-02-03 2016-05-03 feb13 38
#> 5 2 2013-05-08 2014-01-09 feb13 10
#> 6 2 2011-03-13 2013-02-18 feb13 0
#> 7 3 2015-07-03 2016-01-08 feb13 34
#> 8 3 2013-01-09 2015-07-08 feb13 28
#> 9 1 2016-01-03 2017-01-03 march13 45
#> 10 1 2011-01-08 2013-01-08 march13 NA
#> 11 1 2013-02-13 2015-02-13 march13 22
#> 12 2 2012-02-03 2016-05-03 march13 37
#> 13 2 2013-05-08 2014-01-09 march13 9
#> 14 2 2011-03-13 2013-02-18 march13 NA
#> 15 3 2015-07-03 2016-01-08 march13 33
#> 16 3 2013-01-09 2015-07-08 march13 27
#> 17 1 2016-01-03 2017-01-03 april13 44
#> 18 1 2011-01-08 2013-01-08 april13 NA
#> 19 1 2013-02-13 2015-02-13 april13 21
#> 20 2 2012-02-03 2016-05-03 april13 36
#> 21 2 2013-05-08 2014-01-09 april13 8
#> 22 2 2011-03-13 2013-02-18 april13 NA
#> 23 3 2015-07-03 2016-01-08 april13 32
#> 24 3 2013-01-09 2015-07-08 april13 26
And the data used for this:
df <-
data.frame(ID = c(1,1,1,2,2,2,3,3),
start_date = c("2016-01-03",
"2011-01-08",
"2013-02-13",
"2012-02-03",
"2013-05-08",
"2011-03-13",
"2015-07-03",
"2013-01-09"),
loan_maturity_date = c("2017-01-03",
"2013-01-08",
"2015-02-13",
"2016-05-03",
"2014-01-09",
"2013-02-18",
"2016-01-08",
"2015-07-08"),
feb13 = c(46,
NA,
23,
38,
10,
0 ,
34,
28),
march13 = c(45,
NA,
22,
37,
09,
NA,
33,
27),
april13 = c(44,
NA,
21,
36,
08,
NA,
32,
26))