Search code examples
rstrsplit

using strsplit to split a variable three ways


I have a variable that I would like to split... Each line is different but it either has 2 string expression separated by a ","; 3 string expression separate by a ','; 1 string expression; or nothing at all

For Example:

     indel
row1 +1C
row2 +1C,+2CC
row3 0
row4 +1C,+2CC,-1C

Essentially what I want to do is make 3 different variables for each of the possible three string expression. Of course, some rows will have 2, or 1 or none.

I have been able to split and created two different variables for the first two string expression using:

mito$indel1 <- sapply(strsplit(as.character(mito$indel),","),function(x) x[1])
mito$indel2 <- sapply(strsplit(as.character(mito$indel),","),function(x) x[2])

But of course, there is third string expression. I was thinking of creating a temporary indel2 variable, then splitting this again to make the third, but the problem with using the R script above is that it creates the variables as:

     indel         Indel1    Indel2
row1 +1C           +1C       NA
row2 +1C,+2CC      +1C       +2CC
row3 0             0         NA
row4 +1C,+2T,-1C   +1C       +2T

I'm sure this has to do with the second "," in the string and R is getting confused. But is there a way to overcome this without having to edit the entire variable for each row.

I've also tried the following with no luck:

mito$indel2 <- sapply(strsplit(sapply(strsplit(as.character(mito$indel),","),function(x) x[2]),","),function(x) x[1])
mito$indel3 <- sapply(strsplit(sapply(strsplit(as.character(mito$indel),","),function(x) x[2]),","),function(x) x[2])

Any help will be greatly appreciated.


Solution

  • You could also use read.table for this.

    read.table(text=as.character(dat$V1), sep=',', fill=TRUE, as.is=TRUE)
    #    V1   V2  V3
    # 1 +1C         
    # 2 +1C +2CC    
    # 3   0         
    # 4 +1C +2CC -1C