Search code examples
regexrpcre

REGEX PCRE characters between 2 nth occurrences


I have data that resembles the following structure. I need to extract the data that is between the third occurrence of "May 2016" and "Jun 2016".

I have the following pattern which (to be frank) is not properly constructed (And it doesn't bring back the characters I want).

(.*(?>May 2016)){3}(.*(?=Jun 2016)){3}/s

I am new to using Regex, can someone help me with the correct expression please.

May 2016 ef Jun 2016 efef May 2016 Jun 2016 May 2016

dffdg def efef

Jun 2016

May 2016

Jun 2016


Solution

  • If one may assume that "May 2016" and "Jun 2016" alternate and the former goes first, then

    x <- "May 2016 A Jun 2016 B May 2016 Jun 2016 May 2016 C Jun 2016 May 2016 Jun 2016"
    sub("(.*?May 2016.*?Jun 2016){2}.*?May 2016(.*?)Jun 2016.*", "\\2", x)
    [1] " C "