I have a variable that I would like to split... Each line is different but it either has 2 string expression separated by a ","; 3 string expression separate by a ','; 1 string expression; or nothing at all
For Example:
indel
row1 +1C
row2 +1C,+2CC
row3 0
row4 +1C,+2CC,-1C
Essentially what I want to do is make 3 different variables for each of the possible three string expression. Of course, some rows will have 2, or 1 or none.
I have been able to split and created two different variables for the first two string expression using:
mito$indel1 <- sapply(strsplit(as.character(mito$indel),","),function(x) x[1])
mito$indel2 <- sapply(strsplit(as.character(mito$indel),","),function(x) x[2])
But of course, there is third string expression. I was thinking of creating a temporary indel2 variable, then splitting this again to make the third, but the problem with using the R script above is that it creates the variables as:
indel Indel1 Indel2
row1 +1C +1C NA
row2 +1C,+2CC +1C +2CC
row3 0 0 NA
row4 +1C,+2T,-1C +1C +2T
I'm sure this has to do with the second "," in the string and R is getting confused. But is there a way to overcome this without having to edit the entire variable for each row.
I've also tried the following with no luck:
mito$indel2 <- sapply(strsplit(sapply(strsplit(as.character(mito$indel),","),function(x) x[2]),","),function(x) x[1])
mito$indel3 <- sapply(strsplit(sapply(strsplit(as.character(mito$indel),","),function(x) x[2]),","),function(x) x[2])
Any help will be greatly appreciated.
You could also use read.table
for this.
read.table(text=as.character(dat$V1), sep=',', fill=TRUE, as.is=TRUE)
# V1 V2 V3
# 1 +1C
# 2 +1C +2CC
# 3 0
# 4 +1C +2CC -1C