Search code examples
rdatabasedaterowpanel

Separate columns indicating outcomes for two different date in two different rows


I think the title is hard to understand. I will give you a proper example of what I have and what I want.

I have this data for a lot of observations indicating the ID of the individual and some outcomes (as wage and hours) before treatment and after treatment

ID         Wage_{t}  Wage_{t-1}   Hours_{t}   Hours_{t-1}  Establishment
Brain      34563     34563        45          43            X1
Lucke      2545      2356         35          36            E3
Jasmine    26789     1345         42          44            E3
Leila      1000      1234         38          39            E3
Sophie     35421     23453        50          57            Y6

I want to separate the observation before and after treatment in rows and indicate the before/after observation with a dummy variable that takes 1 if the observation is after:

ID         Wage    Hours         Establishment   After_dummy
Brain      34563   43            X1              0
Brain      34563   45            X1              1
Lucke      2356    36            E3              0
Lucke      2545    35            E3              1
Jasmine    1345    44            E3              0
Jasmine    26789   42            E3              1
Leila      1234    39            E3              0
Leila      1000    38            E3              1
Sophie     23453   57            Y6              0
Sophie     35421   50            Y6              1



Solution

  • The column names with braces and hyphens should be modified. In addition, you can also include the desired "dummy" value in the column names. This will make it easier to reshape your data into long format with something like pivot_longer.

    In this case, Wage_t_1 would represent Wage_{t} which will have an after_dummy value of 1.

    library(tidyverse)
    
    names(df) <- c("ID", "Wage_t_1","Wage_t_0", "Hours_t_1", "Hours_t_0", "Establishment")
    
    pivot_longer(df,
                 cols = -c(ID, Establishment), 
                 names_to = c(".value", "after_dummy"),
                 names_pattern = "(Wage|Hours)_t_(\\d+)")
    

    Output

       ID      Establishment after_dummy  Wage Hours
       <chr>   <chr>         <chr>       <int> <int>
     1 Brain   X1            1           34563    45
     2 Brain   X1            0           34563    43
     3 Lucke   E3            1            2545    35
     4 Lucke   E3            0            2356    36
     5 Jasmine E3            1           26789    42
     6 Jasmine E3            0            1345    44
     7 Leila   E3            1            1000    38
     8 Leila   E3            0            1234    39
     9 Sophie  Y6            1           35421    50
    10 Sophie  Y6            0           23453    57
    

    Data

    df <- structure(list(ID = c("Brain", "Lucke", "Jasmine", "Leila", "Sophie"
    ), Wage_t_1 = c(34563L, 2545L, 26789L, 1000L, 35421L), Wage_t_0 = c(34563L, 
    2356L, 1345L, 1234L, 23453L), Hours_t_1 = c(45L, 35L, 42L, 38L, 
    50L), Hours_t_0 = c(43L, 36L, 44L, 39L, 57L), Establishment = c("X1", 
    "E3", "E3", "E3", "Y6")), class = "data.frame", row.names = c(NA, 
    -5L))