Search code examples

Regex Substitue only on a specific group - sedcmd (Splunk)

[so much data exists in the same single line ] ,"Comments": "New alert", **"Data": "{\"etype\":\"MalwareFamily\",\"at\":\"2024-06-21T11:34:07.0000000Z\",\"md\":\"2024-06-21T11:34:07.0000000Z\",\"Investigations\":[{\"$id\":\"1\",\"Id\":\"urn:ZappedUrlInvestigation:2cc87ae3\",\"InvestigationStatus\":\"Running\"}],\"InvestigationIds\":[\"urn:ZappedUrlInvestigation:2cc8782d063\"],\"Intent\":\"Probing\",\"ResourceIdentifiers\":[{\"$id\":\"2\",\"AadTenantId\":\"2dfb29-729c918\",\"Type\":\"AAD\"}],\"AzureResourceId\":null,\"WorkspaceId\":null,\"Metadata\":{\"CustomApps\":null,\"GenericInfo\":null},\"Entities\":[{\"$id\":\"3\",\"MailboxPrimaryAddress\":\"\",\"Upn\":\"\",\"AadId\":\"6eac3b76357\",\"RiskLevel\":\"None\",\"Type\":\"mailbox\",\"Urn\":\"urn:UserEntity:10338af2b6c\",\"Source\":\"TP\",\"FirstSeen\":\"0001-01-01T00:00:00\"}, \"StartTimeUtc\": \"2024-06-21T10:12:37\", \"Status\": \"Investigation Started\"}",** "EntityType": "MalwareFamily", [so much data exists in the same single line ]

In a single line, there exists so much data,

  1. I want to substitue(\") with (") only that falls between Data dictionary value, nothing before and nothing after. sample regex : ( highlighted data only in group 4 should be modified.)
  2. And the Dictionary value is enclosed between quotes(as string) want it to be replaced by []braces as list ( group 3 and 6 )
eg : [so much data exists in the same single line ],"Comments": "New alert", "Data": [{"etype":"MalwareFamily", so on,"Status":"Investigation Started"}],"EntityType": "MalwareFamily", [so much data exists in the same single line ]```


  • In Splunk , sedcmd works on _raw. There is no option to apply it on a specific field.

    Temporary solution : When a Field value is passed as string format instead of list in a json file

    Search Time extraction :

    | rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g" 
    | extract pairdelim="\"{,}" kvdelim=":"

    Index Time extraction :

    SEDCMD-o365DataJsonRemoveBackSlash = s/(\\)+"/"/g s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g