I would like to separate the dates inside of text in my data frame. My data look like this:
tt <- structure(list(V1 = c("(Q)üfür (2013)", "'Bi atlayip çikicam' cümlesini fazla ciddiye aldiysak zaar (2016)",
"A'dan Z'ye (o biçim) (1975)", "Gün ortasinda karanlik (Anne) (1990)"
), V2 = c("Ilker Savaskurt", "Bugra Gülsoy", "Ahmet Mekin",
"Yavuzer Çetinkaya")), .Names = c("V1", "V2"), row.names = c(80404L,
90699L, 34694L, 53178L), class = "data.frame")
I used this script to separate dates from text.
pattern <- "[()]"
tt$info <- strsplit(tt$V1,pattern)
tt$Title <-sapply(tt$info, `[[`, 1)
tt$Year <- sapply(tt$info, function(m) (m)[2])
It gives the dates but there are some texts that have more than one parentheses. Dates are always end of the text so I need to change the script to only get second parenthesis.
I have checked other questions in here but I couldn't come up with a solution. Thanks in advance.
An option using stringi
's stri_extract_last_regex
which captures the last group of text between parenthesis
library(stringi)
stri_extract_last_regex(tt$V1, "(?<=\\().*?(?=\\))")
#[1] "2013" "2016" "1975" "1990"