I feel like this question is asked a lot but all the solutions I found don't work for me either.
I have a dataframe
with a column (called ID
) in which I have a string of numbers and letters (e.g: Q8A203
). In a few rows there are two of those constructs separated by a vertical bar (e.g: Q8AA66|Q8AAT5
). For my analysis it doesn't matter which one I keep so I wanted to make a new column named NewColumn
in which I transfer the first and split the string at |
.
I know that the vertical bar must be treated differently and that I have to put \\
in front. I tried strsplit()
and unlist()
:
df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))
Both options return the exact same content from column ID
to the NewColumn
.
I would very much appreciate the help.
Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.
df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*$","", df$ID, )
df
# ID NewColumn
# 1 Q8A203 Q8A203
# 2 Q8AA66|Q8AAT5 Q8AA66
Please next time, add an minimal reproductible example (your df
here) to speed up answers ;)
strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.
# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))