Search code examples
rcharacterstrsplit

Split Character Strings with different combinations based on letter in R


I have a large data frame with a character column containing a different combination of strings.

For example

**Column1**
1.0.01.01 
1.02.04.03 | E1.3  
G1.2 | 5.01.03.2
30.02.01.04.02 
I.1
10.04.03 | H1.256

The only values I am interested in are the ones starting with a letter. My desired output should look like this:

**Column1**
NA
E1.3  
G1.2
NA
I.1
H1.256

Testdata:

structure(list(Column1 = c("1.0.01.01", "1.02.04.03 | E1.3",
"G1.2 | 5.01.03.2", "30.02.01.04.02", "I.1", "10.04.03 | H1.256")), 
class = "data.frame", row.names = c(NA, -6L)) 

I guess the solution might be really simple with grepl or similar commands, but at the moment I am missing the right idea for a start.


Solution

  • You can try this approach, assuming df is your data frame and Column1 is your column name.

    stringr::str_extract(df$Column1, '[a-zA-Z]+\\d*\\.\\d+')
    

    [a-zA-Z] search for one or more alphabets followed by zero or more matches of digits, followed by dot and then followed by digits

    Output:

    [1] NA       "E1.3"   "G1.2"   NA       "I.1"    "H1.256"