Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it
I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the movie). I want to convert the line of each scene of the movie into a row that contains "Scene 1", "Scene 2", up to "Scene 130" and achieve it sequentially.
url <- "https://www.imsdb.com/scripts/Coco.html"
coco <- read_lines("coco2.txt") #after clean
class(coco)
typeof(coco)
" 48."
[782] " arms full of offerings."
[783] " Once the family clears, Miguel is nowhere to be seen."
[784] " INT. NEARBY CORRIDOR"
[785] " Miguel and Dante hide from the patrolman. But Dante wanders"
[786] " off to inspect a side room."
[787] " INT. DEPARTMENT OF CORRECTIONS"
[788] " Miguel catches up to Dante. He overhears an exchange in a"
[789] " nearby cubicle."
[797] " 49."
[798] " And amigos, they help their amigos."
[799] " worth your while."
[800] " workstation."
[801] " Miguel perks at the mention of de la Cruz."
[809] " Miguel follows him."
[810] " 50." # Its scene number
[811] " INT. HALLWAY"
s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)
My solution is wrong because it is not sequential
v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))
[1] " Scene1"
[2] " Scene1"
[3] " Scene1"
[4] " Scene1"
[5] " Scene1"
[6] " Scene1"
I'm not clear on what [^Level]
represents. However, if the numbers at the end of lines in the text represent the Scene numbers, then you can use ( ) to capture the numbers and substitute them in your replacement text as shown below:
v <- gsub(s, pattern = " ([0-9]{1,3})\\.$", replacement = "Scene \\1")