I am currently working on a project where I have to split a variable into two parts. I have already looked at similar problems, but they have not helped me.
The original data set was created with SPSS. Here is a short variant to illustrate this.
df <- data.frame(code = c("013101", "013102", "013205", "114113"), s01_01 = c(1, 4, 2, 3), s01_02 = c(4, 3, 2, 4))
The variable "code" is the child's code as a character variable because of the leading zero. The other two variables are example questions that children answered. The first two digits within the variable "code" (e.g. 01) denote the school, the digits three and four (e.g. 31) the class. The last two differentiate the children within a class.
I would now like to split the variable "code" into a variable "school" and a variable "class".
My best option so far has been the "data_separate" function.
library(datawizard)
df <-
data_separate(
df,
select = code,
new_columns = c("school", "class"),
separator = 3,
append=TRUE
)
Now I have separated the school, but the variable "class" still contains both the class and the individual child.
I did not find a solution how to create two variables that were cut differently. Most of the solutions around the separate-function focus on knowing a pattern by which to separate the variable. Because I do not have a separator lik "-" or something like that. So how can I tell R exactly how I want the variable to be split?
You say that you are trying to split code
into two variables, but from your description, you are actually trying to split it into 3 variables: school, class, and child.
You can use tidyr::separate
, to do this, creating 3 new columns and splitting code
into 3 variables by cutting it after the second and fourth characters.
tidyr::separate(df, code, into = c('school', 'class', 'child'), sep = c(2, 4))
#> school class child s01_01 s01_02
#> 1 01 31 01 1 4
#> 2 01 31 02 4 3
#> 3 01 32 05 2 2
#> 4 11 41 13 3 4