extract integers from characters in R

I am in R. I want to extract just the numbers from df1. I have for example: df1 <- data.frame( column1 = c("Any[12, 15, 20]", "Any[22, 23, 30]"), column2 = c("Any[4, 17]", "Any[]"), stringsAsFactors = F )

And I want a new df, that takes the integers within the brackets muliples by the row number, and keeps the column information corresponding to it.

e.g. new_df could look like

Time	Channel
12	column1
15	column1
20	column1
44	column1
46	column1
60	column1
8	column2
34	column2

I do not need to preserve any "NA" values, e.g If Any[] is empty. Anyone got any idea if this is possible please? I have ENORMOUS amounts of data in this format, so I cannot really do much manually. Cheers!

I already tried: new_df$Time <- as.integer(df1$column1) and that just gave blanks.

I also tried: new_df$Time <- str_extract_all(new_df$Time, "\\d+" ) %>% lapply(function(x) as.integer(x)) %>% sapply(function(x) if.else(length(x) >0, x, NA) )

which only then returned the first integer within each bracket. e.g.

Time	Channel
12	column1
44	column1
8	column2

Solution

library(dplyr)
library(purrr)
library(stringr)
library(tidyr)

df1|>
  mutate(across(everything(), \(x) imap(str_extract_all(x, "\\d+"), ~ as.numeric(.x) * .y))) |>
  pivot_longer(everything(), cols_vary = "slowest", names_to = "Time", values_to = "Channel") |>
  unnest_longer(Channel)

How it works

This is similar to the approach you took in your initial attempt except I am using purrr::imap instead of lapply. The advantage here is that imap gives you access to the list element name (.y), which is the row number in this case, in addition to the list element (.x). This makes the multiple multiplication step simple.

str_extract_all extracts all the numbers from a column and outputs those numbers in a list:

str_extract_all(df1$column1, "\\d+")
[[1]]
[1] "12" "15" "20"

[[2]]
[1] "22" "23" "30"

imap iterates over this list and does the multiplication:

imap(str_extract_all(df1$column1, "\\d+"), ~ as.numeric(.x) * .y)
[[1]]
[1] 12 15 20

[[2]]
[1] 44 46 60

Then the other two pipes are to reshape the data.

Output

  Time    Channel
  <chr>     <dbl>
1 column1      12
2 column1      15
3 column1      20
4 column1      44
5 column1      46
6 column1      60
7 column2       4
8 column2      17