My datafile contains a variable with responses to several questions.
The structure is:
ID response
1 BCCAD
2 ABCCD
3 BA.DC
.....
I want to separate each response in a new variable, q1, q2, ..:
ID q1 q2 q3 q4 q5
1 B C C A D
2 A B C C D
3 B A . D C
....
I tried the following code
v <- rep("q",5)
z <- as.character(1:5)
paste(v,z,sep="")
for(i in 1:20){
f[i]<- substr(response,i,i)
}
But it only replace the variable names in the vector.
What I intend is to create as many variables as needed to store the values for each question. Variable should be named with a common root, "q", and a subscript showing the position within the string.
Several other options:
1) The separate
function from the tidyr
package:
library(tidyr)
# notation 1:
separate(d, col=response, into=paste0('q',1:5), sep=1:4)
# notation 2:
d %>% separate(col=response, into=paste0('q',1:5), sep=1:4)
2) The tstrsplit
function from the data.table
package:
library(data.table)
setDT(d)[, paste0('q',1:5) := tstrsplit(response, split = '')][, response := NULL][]
3) The cSplit
function of splitstackshape
in combination with setnames
from data.table
:
library(splitstackshape)
setnames(cSplit(d, 'response', sep='', stripWhite=FALSE), 2:6, paste0('q',1:5))[]
which all give the same result:
ID q1 q2 q3 q4 q5
1 1 B C C A D
2 2 A B C C D
3 3 B A . D C
Used data:
d <- structure(list(ID = 1:3, response = c("BCCAD", "ABCCD", "BA.DC")), .Names = c("ID", "response"), class = "data.frame", row.names = c(NA, -3L))