I have a longitudinal dataset which I created by merging different datasets using the personal identifier column.The dataset columns are in the order personal identifier, a_sex, a_countryofbirth, a_health, a_educationstatus, b_sex, b_countryofbirth, b_health, b_educationstatus, c_sex, c_countryofbirth, c_health, c_educationstatus etc all the way to l. All variables starting with a_ represented the first wave, variables starting with b represented the second wave and so on-
I am trying to create a new variable called Wave using Pivot longer so that my table looks like: -
Table: InPreg_transformed
**
- Person ID Wave Sex CountryofBirth Health
**
I used this code among others but it did work.
InPreg_transformed<- InPregDF %>%
pivot_longer(cols = contains("_"),
names_to = c("_value", "Wave"),
names_pattern = "(_+)"
The other code I used: -
InPreg_transformed<- InPreg %>%
pivot_longer(cols = contains("."), names_to = c(".value",
"Wave"), names_pattern = "(.+).(.+)")
summary(InPreg_transformed)
Please assist
Just to be sure that I understood correctly, I created a random nonsense example with n
"individuals".
First load the librairies:
library(tibble)
library(dplyr)
library(tidyr)
Then create the dataset:
sex <- c("Male", "Female")
europe <- c("Belarus", "Belgium", "Bulgaria",
"Croatia", "CzechRepublic", "Estonia", "France",
"Germany", "Hungary", "Ireland", "Italia", "Latvia", "Lithuania",
"Luxembourg", "Netherlands", "Poland", "Portugal", "Romania",
"Slovakia", "Slovenia", "Spain")
health <- c("Excellent", "Good", "Fair", "Poor")
education <- c("High School", "Bachelor's", "Master's", "PhD")
n <- 10
wdat <- tibble(
ID = sprintf("Ind%02i", 1:n), # IDs
a_sex = sample(sex, n, replace = TRUE),
a_countryofbirth = sample(europe, n, replace = TRUE),
a_health = sample(health, n, replace = TRUE),
a_educationstatus = sample(education, 10, replace = TRUE),
b_sex = sample(sex, n, replace = TRUE),
b_countryofbirth = sample(europe, n, replace = TRUE),
b_health = sample(health, n, replace = TRUE),
b_educationstatus = sample(education, 10, replace = TRUE),
c_sex = sample(sex, n, replace = TRUE),
c_countryofbirth = sample(europe, n, replace = TRUE),
c_health = sample(health, n, replace = TRUE),
c_educationstatus = sample(education, 10, replace = TRUE))
The data wdat
contains a unique ID for each individual and then three blocks of columns.
With this data, the syntax to transform it to a "long" format can be done with the pivot_longer
function like so
wdat %>%
pivot_longer(
-ID,
names_to = c("wave", ".value"),
names_pattern = "(.)_(.*)"
)
where
names_pattern = "(.)_(.*)"
means that there are two important pieces of information in the column name, a single character first and a string, separated with _
,names_to = c("wave", ".value")
means that the single character will go in a column called wave
, and the value in the wide columns will go in columns whose names are based on the common pattern, e.g., all the values in a_sex
, b_sex
and c_sex
will go in a column called sex
EDIT: using names_sep
is much easier in this case
wdat %>%
pivot_longer(
-ID,
names_sep = "_",
names_to = c("wave", ".value")
)