Search code examples
rescapingstrsplit

Split string at a vertical bar character "|"


I feel like this question is asked a lot but all the solutions I found don't work for me either.

I have a dataframe with a column (called ID) in which I have a string of numbers and letters (e.g: Q8A203). In a few rows there are two of those constructs separated by a vertical bar (e.g: Q8AA66|Q8AAT5). For my analysis it doesn't matter which one I keep so I wanted to make a new column named NewColumn in which I transfer the first and split the string at |.

I know that the vertical bar must be treated differently and that I have to put \\ in front. I tried strsplit() and unlist():

df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))

Both options return the exact same content from column ID to the NewColumn.

I would very much appreciate the help.


Solution

  • Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.

    df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
    df$NewColumn <- sub("\\|.*$","", df$ID, )
    df  
    #              ID NewColumn
    # 1        Q8A203    Q8A203
    # 2 Q8AA66|Q8AAT5    Q8AA66
    

    Please next time, add an minimal reproductible example (your df here) to speed up answers ;)

    strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.

    # Working with a list
    unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))