First: Find the texts that are inside the quotations "I want everything inside here".
Second: To extract 1 sentence before quotation.
I would like to achieve this output desirable by look behind regex in R if possible
Example:
Yoyo. He is sad. Oh no! "Don't sad!" Yeah: "Testing... testings," Boys. Sun. Tree... 0.2% green,"LL" "WADD" HOLA.
Desired Output:
[1] Oh no! "Don't sad!"
[2] Yeah: "Testing... testings"
[3] Tree... 0.2% green, "LL"
[4] Tree... 0.2% green, "LL" "WADD"
dput:
"Yoyo. He is sad. Oh no! \"Don't sad!\" Yeah: \"Testing... testings,\" Boys. Sun. Tree... 0.2% green,\"LL\" \"WAAD\" HOLA."
Tried using this but can't work:
str_extract(t, "(?<=\\.\\s)[^.:]*[.:]\\s*\"[^\"]*\"")
Also tried:
regmatches(t , gregexpr('^[^\\.]+[\\.\\,\\:]\\s+(.*(?:\"[^\"]+\\")).*$', t))
regmatches(t , gregexpr('\"[^\"]*\"(?<=\\s[.?][^\\.\\s])', t))
Tried your method @naurel:
> regmatches(t, regexpr("(?:\"? *([^\"]*))(\"[^\"]*\")", t, perl=T))
[1] " Yoyo. He is sad. Oh no! \"Don't sad!\""
Since you just want the last sentence I've cleared the regex for you : result
Explanation : First you're looking for something that is between quotes. And if there is multiples quotes successively you want them to match as one.
(\"[^\"]*\"(?: *\"[^\"]*\")*)
Does the trick. Then you want to match the sentence before this group. A sentence is starting with a CAPITAL letter. So we will start the match to the first capital encounter before the previously defined group (ie : not followed by any other CAPITAL letter)
([A-Z](?:[a-z0-9\W\s])*)
Put it togeither and you obtain :
([A-Z](?:[a-z0-9\W\s])*)(\"[^\"]*\"(?: *\"[^\"]*\")*)