Search code examples
rstringtokenstrsplit

Split a string, tokenize substrings, and convert tokens to numeric vectors


I have a character string:

String <- "268.1,271.1,280.9,294.7,285.6,288.6,384.4\n124.8,124.2,116.2,117.7,118.3,122.0,168.3\n18,18,18,18,18,18,18"

I would like to split it into three substrings based on \n.

I did that using the following code:

strsplit(String, "\n")

It resulted in three substrings.

  1. How can I get three separate subsisting so that I can use each vector for calculations?

  2. How can I tokenize the substrings, to create vectors of numeric values?


Solution

  • Here's an approach with base R. strsplit is a little tricky in that it returns a list and also does not work on a list.

    1. As you suggest in your question, use strsplit with split = "\n" to split into a list of 3 strings.

    2. Use unlist to change that list into a vector of 3 character strings.

    3. Use strsplit again with split = "," to create a list of 3 character vectors

    4. Use lapply to convert those character vectors into numeric vectors.

    lapply(strsplit(unlist(strsplit(String,"\n")),","),as.numeric)
    [[1]]
    [1] 268.1 271.1 280.9 294.7 285.6 288.6 384.4
    
    [[2]]
    [1] 124.8 124.2 116.2 117.7 118.3 122.0 168.3
    
    [[3]]
    [1] 18 18 18 18 18 18 18