Search code examples
regexpcreregex-lookaroundslogstash-grokregex-greedy

Greedy match unless it runs into specific string, then match to specific group?


I'm trying to parse a URL in logstash using regex/grok. I've figured out most of the string, but I'm stuck on the last part, which i've found difficult to explain:

Here is the part I'm stuck on:

In Logstash, I would like to capture this whole string and dump it into a field called api_info, UNLESS it contains the string &freeText=, in which case I want everything up until &freeText= to go into the api_info field, and everything after the &freeText= to go into the api_search field. Otherwise, the api_search field should be null.

Here's what I have so far/ have tried:

(?<api_info>.*?)(?=&freeText=)?(:?&freeText=)(?<api_search>.*)?
(?<api_info>.*)((:?&freeText=)(?<api_search>.*))?

Input string: womens%7cshoes%ctrainer&pageSize=60&freeText=shoes30

expected input/output:

womens%7cshoes%ctrainer&pageSize=60&freeText=shoes30
api_info:"womens%7cshoes%ctrainer&pageSize=60", api_search:"shoes30"
mens%7trainers&pageSize=90
api_info:"mens%7trainers&pageSize=90", api_search:null

Solution

  • Note sure if an empty group converts to null, but you might use an alternation to match either the end of the string $ or &freeText=

    For the api_search group, you could match any char 0+ times.

    (?<api_info>.+?)(?:&freeText=|$)(?<api_search>.*)
    

    Explanation

    • (?<api_info>.+?) Group api_info, match any char except newline 1+ times
    • (?:&freeText=|$) Match either &freeText= or assert end of string
    • (?<api_search>.*) Group api_search, match any char except newline 0+ times

    Regex demo