My data frame is so dirty, it put my interesting variable together.
eid edu
1 1009467 "A levels/AS levels or equivalent"
2 1016906 "A levels/AS levels or equivalent"
3 1018742 "A levels/AS levels or equivalent"
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"
5 1030785 "A levels/AS levels or equivalent","CSEs or equivalent"
or you can copy it:
structure(list(n = 399:401, edu = c("\"A levels/AS levels or equivalent\",\"College or University degree\",\"CSEs or equivalent\",\"NVQ or HND or HNC or equivalent\"",
"\"A levels/AS levels or equivalent\",\"College or University degree\",\"CSEs or equivalent\",\"NVQ or HND or HNC or equivalent\",\"O levels/GCSEs or equivalent\"",
"\"A levels/AS levels or equivalent\",\"College or University degree\",\"CSEs or equivalent\",\"NVQ or HND or HNC or equivalent\",\"O levels/GCSEs or equivalent\""
)), row.names = c(NA, 3L), class = "data. Frame")
and it may be contain 8 education level options:
"A levels/AS levels or equivalent",
"College or University degree",
"CSEs or equivalent",
"NVQ or HND or HNC or equivalent",
"O levels/GCSEs or equivalent",
"Other professional qualifications eg: nursing, teaching",
"Prefer not to answer",
"None of the above"
The dirtiest thing is that it may appear at the same time, the first five are selected together.
I want to separate my column of edu according their value to become new variable, and if edu contain it , it will display 1, if not, it will display 0
like it:
eid edu
1 1009467 "A levels/AS levels or equivalent"
2 1016906 "A levels/AS levels or equivalent"
3 1018742 "A levels/AS levels or equivalent"
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"
5 1043561 "A levels/AS levels or equivalent","CSEs or equivalent"
A levels CSEs
1 0
1 0
1 0
1 1
1 1
Thank you!
Try this:
library(dplyr) #>= 1.1.0
library(tidyr)
df %>%
separate_rows(edu, sep = ",") %>%
mutate(edu = gsub("^\"|\"$", "",edu)) %>%
mutate(A_levels = ifelse(edu %in% "A levels/AS levels or equivalent", 1, 0),
College_Uni = ifelse(edu %in% "College or University degree", 1, 0),
CSEs = ifelse(edu %in% "CSEs or equivalent", 1, 0),
NVQ_HND_HNC = ifelse(edu %in% "NVQ or HND or HNC or equivalent", 1, 0),
O_levels_GCSEs = ifelse(edu %in% "O levels/GCSEs or equivalent", 1, 0),
Other_prof_qual = ifelse(edu %in% "Other professional qualifications eg: nursing, teaching", 1, 0),
Prefer_not_to_answer = ifelse(edu %in% "Prefer not to answer", 1, 0),
None_of_the_above = ifelse(edu %in% "None of the above", 1, 0)) %>%
summarise(across(everything(), ~max(.)), .by = eid)
eid edu A_levels Colleg…¹ CSEs NVQ_H…² O_lev…³ Other…⁴ Prefe…⁵ None_…⁶
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1009467 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
2 1016906 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
3 1018742 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
4 1030778 CSEs or equivalent 1 0 1 0 0 0 0 0
5 1030785 CSEs or equivalent 1 0 1 0 0 0 0 0
# … with abbreviated variable names ¹College_Uni, ²NVQ_HND_HNC, ³O_levels_GCSEs, ⁴Other_prof_qual,
# ⁵Prefer_not_to_answer, ⁶None_of_the_above