Search code examples
rsplitstrsplit

splitting comma separated mixed text and numeric string with strsplit in R


I have many strings of the form name1, name2 and name3, 0, 1, 2 or name1, name2, name3 and name4, 0, 1, 2 and would like to split the vector into 4 elements where the first one would be the whole text string of names. The problem is that strsplit doesn't differenciate between text and numbers and split the string into 5 elements in the first case and into 6 elements in the second example. How can I tell R to dynamically skip the text part of the string with variable number of names?


Solution

  • You have two main options:
    (1) grep for the numbers, and extract those.
    (2) split on the comma, then coerce to numeric and check for NAs

    I prefer the second

    splat <- strsplit(x, ",")[[1]]
    numbs <- !is.na(suppressWarnings(as.numeric(splat)))
    
    c(paste(splat[!numbs], collapse=","), splat[numbs])
    # [1] "name1, name2 and name3" " 0" " 1" " 2"