I have a huge text file with the following structure:
AA<-tibble::tribble(
~`-------------------------------------------------`,
"ABCD 2002201234 09-06-2015 10:34",
"-------------------------------------------------",
"Lorem ipsum",
"Lorem ipsum",
"Lorem ipsum Lorem ipsum",
"Lorem ipsum: Lorem ipsum",
"123456",
"AB",
"AB",
"Lorem ipsum",
"-------------------------------------------------",
"ABCDEF 1001101234 05-03-2011 09:15",
"-------------------------------------------------",
"TEST",
"TEST"
)
I want to organise the above into a DF with variables: ID, DATE and TEXT. ID should be the 10-digit number (in the example 2002201234 and 1001101234) DATE is self explanatory and TEXT should be all text between the bottom line ("-------------") to the upper line of next post.
Which is the easiest way to perform this?
Regards, H
in base R:
x <- paste(AA[[1]], collapse = '\n')
y <- regmatches(x, gregexec("(\\d{10}) *(.*?)\n-+([^-]+)", x, perl = TRUE))[[1]]
setNames(data.frame(t(y[2:4,])), c('ID', 'Date', 'Text'))
ID Date Text
<chr> <chr> <chr>
1 2002201234 09-06-2015 10:34 "\nLorem ipsum\nLorem ipsum\nLorem ipsum Lo…
2 1001101234 05-03-2011 09:15 "\nTEST\nTEST"