r dataframe split duplicates alphanumeric

Find length and placement of alphabet characters in character string R

I have a dataframe composed of 3 columns:

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, gh 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))
> df
        column1 column2 column3
1 ab 34, 35, 36   ID_27  area51
2         cb 23   ID_28  area52
3  df 45, gh 46   ID_29  area53
4         gh 21   ID_30  area54

I need to identify the length of the letter prefixes in column1. This means being able to identify and filter rows similar to row 3 here which contains more than two letters, but in a large data set. Eg (output after filter).

column1 column2 column3
3 df 45, gh 46   ID_29  area53

Ideally I would also like to get the position of the first letter in each prefix. This could be achieved by putting the length and position values into two additional but separate columns in the dataframe, or by filtering it. Either way which achieves the same result is fine. Both tidyverse and older approaches welcome.

If anyone wants to go further, the idea is to attach the two letter prefix to the number codes which follow it, up until the letter prefix changes like the example above. The following step is to then paste or copy these new values in separate rows while simultaneously duplicating the values in the other columns.

Solution

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, gh 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))

library(dplyr)
library(stringr)

df %>% filter(str_count(column1, "[a-z]+") > 1)

# A tibble: 1 x 3
# Groups:   column3 [1]
  column1      column2 column3
  <fct>        <fct>   <fct>  
1 df 45, gh 46 ID_29   area53