Here is a table of content:
df <- tibble(ToC=
c("3.1 texta.............. 22",
"3.2 textb 25",
"section 6 ................. 50",
"section 10.2 65"))
I want to extract the contents and their respective page numbers as two variables. I tried the following, but it's not working correctly.
library(tidyverse); library(stringr)
df_toc <- df %>%
mutate(page = as.numeric(str_extract(ToC, "[0-9]+")))
The correct page numbers should be 22, 25, 50, and 65. How should I solve this?
Try this (digits at the end of a line):
df %>%
mutate(page = as.numeric(str_extract(ToC, "\\d+$")))