I have a dataframe composed of 3 columns:
df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, gh 46", "gh 21"),
column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
column3 = c("area51", "area52", "area53", "area54"))
> df
column1 column2 column3
1 ab 34, 35, 36 ID_27 area51
2 cb 23 ID_28 area52
3 df 45, gh 46 ID_29 area53
4 gh 21 ID_30 area54
I need to identify the length of the letter prefixes in column1. This means being able to identify and filter rows similar to row 3 here which contains more than two letters, but in a large data set. Eg (output after filter).
column1 column2 column3
3 df 45, gh 46 ID_29 area53
Ideally I would also like to get the position of the first letter in each prefix. This could be achieved by putting the length and position values into two additional but separate columns in the dataframe, or by filtering it. Either way which achieves the same result is fine. Both tidyverse and older approaches welcome.
If anyone wants to go further, the idea is to attach the two letter prefix to the number codes which follow it, up until the letter prefix changes like the example above. The following step is to then paste or copy these new values in separate rows while simultaneously duplicating the values in the other columns.
df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, gh 46", "gh 21"),
column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
column3 = c("area51", "area52", "area53", "area54"))
library(dplyr)
library(stringr)
df %>% filter(str_count(column1, "[a-z]+") > 1)
# A tibble: 1 x 3
# Groups: column3 [1]
column1 column2 column3
<fct> <fct> <fct>
1 df 45, gh 46 ID_29 area53