Search code examples
regexpcre

Optional element between .*?


I'm trying to extract an optional element via PCRE from the following example. I need to pull out the xxxx-xxx-xxxx-xxxx-xxxxx if ActivityID exists.

I'm guessing I need to use lookaheads or the like but I can't quite wrap my head around it.

</Level><Task>...<Correlation ActivityID='{xxxx-xxx-xxxx-xxxx-xxxxx}'/><Execution...</Channel>
This works if the element exists, saving to taco64:
<\/level>(?<taco16>.*?)ActivityID='{(?<taco64>.*)}'(?<taco32>.*?)<Computer>

Being optional drops everything into taco32. 
<\/level>(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}')?(?<taco32>.*?)<Computer>

Solution

  • Use

    <\/level>(?:(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}'))?(?<taco32>.*?)<Computer>
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      <                        '<'
    --------------------------------------------------------------------------------
      \/                       '/'
    --------------------------------------------------------------------------------
      level>                   'level>'
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
    --------------------------------------------------------------------------------
        (?<taco16>                 group and capture to taco16:
    --------------------------------------------------------------------------------
          .*?                      any character except \n (0 or more
                                   times (matching the least amount
                                   possible))
    --------------------------------------------------------------------------------
        )                        end of taco16
    --------------------------------------------------------------------------------
        (?<taco64>               group and capture to taco64:
    --------------------------------------------------------------------------------
          ActivityID='{            'ActivityID=\'{'
    --------------------------------------------------------------------------------
          (                        group and capture to \3:
    --------------------------------------------------------------------------------
            .*                       any character except \n (0 or more
                                     times (matching the most amount
                                     possible))
    --------------------------------------------------------------------------------
          )                        end of taco64
    --------------------------------------------------------------------------------
          }'                       '}\''
    --------------------------------------------------------------------------------
        )                        end of \2
    --------------------------------------------------------------------------------
      )?                       end of grouping
    --------------------------------------------------------------------------------
      (?<taco32>                group and capture to taco32:
    --------------------------------------------------------------------------------
        .*?                      any character except \n (0 or more times
                                 (matching the least amount possible))
    --------------------------------------------------------------------------------
      )                        end of taco32
    --------------------------------------------------------------------------------
      <Computer>               '<Computer>'