Search code examples
regexsyslogazure-log-analyticskql

How do I extend this RegEx with CEF data with these two KeyValue pairs?


I currently have this regex:

?P<key>\w+)=(?P<value>[a-zA-Z0-9-_:/@. ]+

Input row 1: event=1921;json={"source":"A","location":B":"folder":"c:\\windows\\system32"},"id":2,"address":null,"name":"gone";

Input row 2: dev=b;json={"dest":"123","home":AZ":"loc":"sys"},"ab":9,"home":null,"someKey":"someValue";

It correctly extracts the "event=1921;" but does extract the two other types.

  1. How do I extract the "json={...}" using Key (JSON) and Value?
  2. How do I extract "name":"gone" using Key (Name) and Value (gone)? The solution needs to be dynamic since key fields will be named differently in other rows.

Solution

  • you should be able to use the parse operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parseoperator

    for example:

    print input = 'event=1921;json={"source":"A","location":B":"folder":"c:\\windows\\system32"},"id":2,"address":null,"name":"gone";'
    | parse input with * "json=" json:dynamic ',"id"' * '"name":"' name '"' *
    

    if your payload / property names are entirely dynamic, then:

    a. I would recommend you evaluate your options to structure the source data in standard format (currently, even the "json" part of it isn't valid JSON)

    b. you could try the following - functional, but very inefficient (not recommended for large scale data procesing)

    datatable(input:string)
    [
        'event=1921;json={"source":"A","location":B":"folder":"c:\\windows\\system32"},"id":2,"address":null,"name":"gone";',
        'dev=b;json={"dest":"123","home":AZ":"loc":"sys"},"ab":9,"home":null,"someKey":"someValue";'
    ]
    | parse input with prefix ";json={" json:dynamic '},' suffix
    | mv-apply x = extract_all(@'(\w+)=(\w+)', prefix) on (
        project p = pack(tostring(x[0]), x[1])
        | summarize b1 = make_bag(p)
    )
    | mv-apply y = extract_all(@'"(\w+)":"?(\w+)"?', suffix) on (
        project p = pack(tostring(y[0]), y[1])
        | summarize b2 = make_bag(p)
    )
    | project json = strcat("{", json, "}"), b = bag_merge(b1, b2)
    | evaluate bag_unpack(b)