I think the title is hard to understand. I will give you a proper example of what I have and what I want.
I have this data for a lot of observations indicating the ID of the individual and some outcomes (as wage and hours) before treatment and after treatment
ID Wage_{t} Wage_{t-1} Hours_{t} Hours_{t-1} Establishment
Brain 34563 34563 45 43 X1
Lucke 2545 2356 35 36 E3
Jasmine 26789 1345 42 44 E3
Leila 1000 1234 38 39 E3
Sophie 35421 23453 50 57 Y6
I want to separate the observation before and after treatment in rows and indicate the before/after observation with a dummy variable that takes 1 if the observation is after:
ID Wage Hours Establishment After_dummy
Brain 34563 43 X1 0
Brain 34563 45 X1 1
Lucke 2356 36 E3 0
Lucke 2545 35 E3 1
Jasmine 1345 44 E3 0
Jasmine 26789 42 E3 1
Leila 1234 39 E3 0
Leila 1000 38 E3 1
Sophie 23453 57 Y6 0
Sophie 35421 50 Y6 1
The column names with braces and hyphens should be modified. In addition, you can also include the desired "dummy" value in the column names. This will make it easier to reshape your data into long format with something like pivot_longer
.
In this case, Wage_t_1
would represent Wage_{t}
which will have an after_dummy
value of 1.
library(tidyverse)
names(df) <- c("ID", "Wage_t_1","Wage_t_0", "Hours_t_1", "Hours_t_0", "Establishment")
pivot_longer(df,
cols = -c(ID, Establishment),
names_to = c(".value", "after_dummy"),
names_pattern = "(Wage|Hours)_t_(\\d+)")
Output
ID Establishment after_dummy Wage Hours
<chr> <chr> <chr> <int> <int>
1 Brain X1 1 34563 45
2 Brain X1 0 34563 43
3 Lucke E3 1 2545 35
4 Lucke E3 0 2356 36
5 Jasmine E3 1 26789 42
6 Jasmine E3 0 1345 44
7 Leila E3 1 1000 38
8 Leila E3 0 1234 39
9 Sophie Y6 1 35421 50
10 Sophie Y6 0 23453 57
Data
df <- structure(list(ID = c("Brain", "Lucke", "Jasmine", "Leila", "Sophie"
), Wage_t_1 = c(34563L, 2545L, 26789L, 1000L, 35421L), Wage_t_0 = c(34563L,
2356L, 1345L, 1234L, 23453L), Hours_t_1 = c(45L, 35L, 42L, 38L,
50L), Hours_t_0 = c(43L, 36L, 44L, 39L, 57L), Establishment = c("X1",
"E3", "E3", "E3", "Y6")), class = "data.frame", row.names = c(NA,
-5L))