I have a large data frame with a character column containing a different combination of strings.
For example
**Column1**
1.0.01.01
1.02.04.03 | E1.3
G1.2 | 5.01.03.2
30.02.01.04.02
I.1
10.04.03 | H1.256
The only values I am interested in are the ones starting with a letter. My desired output should look like this:
**Column1**
NA
E1.3
G1.2
NA
I.1
H1.256
Testdata:
structure(list(Column1 = c("1.0.01.01", "1.02.04.03 | E1.3",
"G1.2 | 5.01.03.2", "30.02.01.04.02", "I.1", "10.04.03 | H1.256")),
class = "data.frame", row.names = c(NA, -6L))
I guess the solution might be really simple with grepl or similar commands, but at the moment I am missing the right idea for a start.
You can try this approach, assuming df is your data frame and Column1 is your column name.
stringr::str_extract(df$Column1, '[a-zA-Z]+\\d*\\.\\d+')
[a-zA-Z] search for one or more alphabets followed by zero or more matches of digits, followed by dot and then followed by digits
Output:
[1] NA "E1.3" "G1.2" NA "I.1" "H1.256"