Search code examples
azure-data-explorerkqladx

Assign custom RegEx to variable in parse operator


I am trying to use the parse operator to parse data into their respective fields. It seems that data is only parsable in between throwaway regex patterns, but I need to capture a pattern into a variable. So far I have the below query:

let Traces = datatable(EventText:string)
[
    '2021-10-04T20:43:03,174    2511 INFO cd060096-c6c4-4ddf-b9f7-5795f6d04514 c2a42807-6ab3-41bb-8d72-1c48f2213c31 iTKTS Fiona (ABSDEF) () () () ITKTSUtil - <ProductFulfillmentResponse>U2028  <errorStatus>UNPROCESSED</errorStatus>U2028  <errorCode>GEN_ERR</errorCode>U2028  <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028  <customerDocuments>U2028    <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028    <itemFulfillmentInfos>U2028      <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028    </itemFulfillmentInfos>U2028  </customerDocuments>U2028</ProductFulfillmentResponse>U2028'
];
Traces  
| parse kind = regex EventText with _timestamp ",\\d{3} " _threadid " " _logLevel " " _clientTransactionId " " _appTransactionId " " _appService " " _bigeazy " \\(" _recordLocator "\\) \\(" _status "\\) \\(" _responseTime "\\) \\(" _serviceName "\\) " _className " - " _message
| project _className, _message

I need _className to match "ITKTSUtil". By default a variable matches the pattern (.*?). If I change it to _className:long it matches the pattern (\-\d+). But I need it to match the pattern //w* and then be captured into the variable _className. Is this possible with KQL?


Solution

  • Please try the following approach :

    let Traces = datatable(EventText:string)
    [
        '2021-10-04T20:43:03,174    2511 INFO cd060096-c6c4-4ddf-b9f7-5795f6d04514 c2a42807-6ab3-41bb-8d72-1c48f2213c31 iTKTS Fiona (ABSDEF) () () () ITKTSUtil - <ProductFulfillmentResponse>U2028  <errorStatus>UNPROCESSED</errorStatus>U2028  <errorCode>GEN_ERR</errorCode>U2028  <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028  <customerDocuments>U2028    <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028    <itemFulfillmentInfos>U2028      <errorDescription>WARNING - UNPROCESSED DUE TO OTHER ERRORS</errorDescription>U2028    </itemFulfillmentInfos>U2028  </customerDocuments>U2028</ProductFulfillmentResponse>U2028'
    ];
    Traces  
    | parse kind = regex flags=U EventText with _timestamp ",\\d{3} " _threadid " " _logLevel " " _clientTransactionId " " _appTransactionId " " _appService " " _bigeazy " \\(" _recordLocator "\\) \\(" _status "\\) \\(" _responseTime "\\) \\(" _serviceName "\\) " _className " - " _message "$"
    | project _className, _message
    

    The main idea was to use the flags of parse regex mode (using the regex flag U which means ungreedy in order to match only the required field and also add "$" to require the parse regex mode to perform a full match).

    Please note that if your pattern in know in advance, it is recommended to use the parse simple mode which is much faster.