I'm trying to parse a URL in logstash using regex/grok. I've figured out most of the string, but I'm stuck on the last part, which i've found difficult to explain:
Here is the part I'm stuck on:
In Logstash, I would like to capture this whole string and dump it into a field called api_info
, UNLESS it contains the string &freeText=
, in which case I want everything up until &freeText=
to go into the api_info
field, and everything after the &freeText=
to go into the api_search
field. Otherwise, the api_search field should be null.
Here's what I have so far/ have tried:
(?<api_info>.*?)(?=&freeText=)?(:?&freeText=)(?<api_search>.*)?
(?<api_info>.*)((:?&freeText=)(?<api_search>.*))?
Input string:
womens%7cshoes%ctrainer&pageSize=60&freeText=shoes30
expected input/output:
womens%7cshoes%ctrainer&pageSize=60&freeText=shoes30
api_info:"womens%7cshoes%ctrainer&pageSize=60", api_search:"shoes30"
mens%7trainers&pageSize=90
api_info:"mens%7trainers&pageSize=90", api_search:null
Note sure if an empty group converts to null, but you might use an alternation to match either the end of the string $
or &freeText=
For the api_search group, you could match any char 0+ times.
(?<api_info>.+?)(?:&freeText=|$)(?<api_search>.*)
Explanation
(?<api_info>.+?)
Group api_info
, match any char except newline 1+ times(?:&freeText=|$)
Match either &freeText=
or assert end of string(?<api_search>.*)
Group api_search
, match any char except newline 0+ times