Search code examples
rsortingtimepanellag

convert Days in occurences in R


I have a data of this form :

ID    FIRM   PAY_START    PAY_end
1      A        1         359
1      A        360       500
1      B        720       780
1      C        800       930
1      D        934       1200
2      E        1         316
2      E        320       360
3      F        1         339
3      F        340       342
3      G        345       600

I want First to convert PAY_START( which are in Days) into occurences in the Panel 1, 2, 3 like this:

ID    FIRM   PAY_START    PAY_end  Occurence
1      A        1         359      1
1      A        360       500      2
1      B        720       780      3
1      C        800       930      4
1      D        934       1200     5
2      E        1         316      1
2      E        320       360      2
3      F        1         339      1
3      F        340       342      2
3      G        345       600      3

And I want to compute the lags of Pay_end:

ID    FIRM   PAY_START    PAY_end  Occurence   Lag_Pay_end
1      A        1         359      1           0
1      A        360       500      2           141
1      B        720       780      3           280
1      C        800       930      4           150
1      D        934       1200     5           270
2      E        1         316      1           0
2      E        320       360      2           44
3      F        1         339      1           0
3      F        340       342      2           3
3      G        345       600      3           258

Thank you for your help!


Solution

  • a data.table approach

    library(data.table)
    DT <- fread("ID    FIRM   PAY_START    PAY_end
    1      A        1         359
    1      A        360       500
    1      B        720       780
    1      C        800       930
    1      D        934       1200
    2      E        1         316
    2      E        320       360
    3      F        1         339
    3      F        340       342
    3      G        345       600")
    
    DT[, Occurence := rowid(ID)]
    DT[, Lag_Pay_end := PAY_end - shift(PAY_end, type = "lag", fill = PAY_end[1]),
       by = .(ID)]
    #    ID FIRM PAY_START PAY_end Occurence Lag_Pay_end
    # 1:  1    A         1     359         1           0
    # 2:  1    A       360     500         2         141
    # 3:  1    B       720     780         3         280
    # 4:  1    C       800     930         4         150
    # 5:  1    D       934    1200         5         270
    # 6:  2    E         1     316         1           0
    # 7:  2    E       320     360         2          44
    # 8:  3    F         1     339         1           0
    # 9:  3    F       340     342         2           3
    #10:  3    G       345     600         3         258