Search code examples
regexrtext-segmentation

Split column by last word in sentence


YARQ (Yet another regex question).

How would I go about splitting the following into two columns, making sure that the last column contains the last word in the sentence and the first column contains everything else.

x <- c("This is a test",
       "Testing 1,2,3 Hello",
       "Foo Bar",
       "Random 214274(%*(^(* Sample",
       "Some Hyphenated-Thing"
       )

Such that I end up with:

col1                         col2
this is a                    test
Testing 1,2,3                Hello
Foo                          Bar
Random 214274(%*(^(*         Sample
Some                         Hyphenated-Thing

Solution

  • This looks like a job for look ahead. We'll find spaces followed by things which are not spaces.

    split <- strsplit(x, " (?=[^ ]+$)", perl=TRUE)
    matrix(unlist(split), ncol=2, byrow=TRUE)
    
         [,1]                   [,2]              
    [1,] "This is a"            "test"            
    [2,] "Testing 1,2,3"        "Hello"           
    [3,] "Foo"                  "Bar"             
    [4,] "Random 214274(%*(^(*" "Sample"          
    [5,] "Some"                 "Hyphenated-Thing"