Search code examples
rstringcharacterspace

take out the characters in the middle between spaces of a string in r


I have several strings like

"AAA BBB CCC 1X2L BOT BR, DDD EEE FFF 3X4L BOT BR, GGG 5X6L BOT BR"

And I just want to take out the characters before the last last spaces, i.e., I want

"1X2L, 3X4L, 5X6L"

only.

How can I reach this in R?


Solution

  • You can try using sub after splitting the string on comma (,).

    x <- "AAA BBB CCC 1X2L BOT BR, DDD EEE FFF 3X4L BOT BR, GGG 5X6L BOT BR"
    sub('.*?(\\w+)\\s\\w+\\s\\w+$', '\\1', strsplit(x, ',\\s')[[1]])
    #[1] "1X2L" "3X4L" "5X6L"
    

    .*? - matches as few characters as possible until

    ((\\w+) - is a capture group to capture the word that we want

    \\s - a whitespace followed by

    \\w+ - a word followed by

    \\s - another whitespace and a word (\\w+) is encountered.)