Search code examples
rsubstr

R How to create several variables containing substring characters


My datafile contains a variable with responses to several questions.

The structure is:

ID response
1  BCCAD
2  ABCCD
3  BA.DC
.....

I want to separate each response in a new variable, q1, q2, ..:

ID q1 q2 q3 q4 q5
1  B  C  C  A  D
2  A  B  C  C  D
3  B  A  .  D  C
....

I tried the following code

 v <- rep("q",5)
 z <- as.character(1:5)
 paste(v,z,sep="")
 for(i in 1:20){
 f[i]<- substr(response,i,i)
 }

But it only replace the variable names in the vector.

What I intend is to create as many variables as needed to store the values for each question. Variable should be named with a common root, "q", and a subscript showing the position within the string.


Solution

  • Several other options:

    1) The separate function from the tidyr package:

    library(tidyr)
    # notation 1:
    separate(d, col=response, into=paste0('q',1:5), sep=1:4)
    # notation 2:
    d %>% separate(col=response, into=paste0('q',1:5), sep=1:4)
    

    2) The tstrsplit function from the data.table package:

    library(data.table)
    setDT(d)[, paste0('q',1:5) := tstrsplit(response, split = '')][, response := NULL][]
    

    3) The cSplit function of splitstackshape in combination with setnames from data.table:

    library(splitstackshape)
    setnames(cSplit(d, 'response', sep='', stripWhite=FALSE), 2:6, paste0('q',1:5))[]
    

    which all give the same result:

      ID q1 q2 q3 q4 q5
    1  1  B  C  C  A  D
    2  2  A  B  C  C  D
    3  3  B  A  .  D  C
    

    Used data:

    d <- structure(list(ID = 1:3, response = c("BCCAD", "ABCCD", "BA.DC")), .Names = c("ID", "response"), class = "data.frame", row.names = c(NA, -3L))