I'm currently trying to put convert some YAML into JSON using python, and am having a hard time getting the JSON formatted properly. My YAML file has multiple documents that look like this:
title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
- https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage:
- '*\mshta.exe'
- '*\powershell.exe'
- '*\cmd.exe'
- '*\rundll32.exe'
- '*\cscript.exe'
- '*\wscript.exe'
- '*\wmiprvse.exe'
Image:
- '*\schtasks.exe'
- '*\nslookup.exe'
- '*\certutil.exe'
- '*\bitsadmin.exe'
- '*\mshta.exe'
condition: selection
fields:
- CommandLine
- ParentCommandLine
falsepositives:
- Administrative scripts
level: medium
...
What I'm trying to do is for every document, pull the detection, fields, falsepositives, and level and put those into a JSON document as individual arrays. My first attemp was pretty poor, and just lumped the groups from every document into lists:
data = {}
data['indicator'] = {}
data['indicator']['detection']=[]
data['indicator']['fields']=[]
data['indicator']['false positives']=[]
data['indicator']['level']=[]
with open(yaml_file, 'r') as yaml_in, open(json_file, 'a') as definition:
loadyaml = yaml.safe_load_all(yaml_in)
for item in loadyaml:
for header, subsections in item.iteritems():
if header == 'detection':
data['indicator']['detection'].append(subsections)
elif header == 'fields':
data['indicator']['fields'].append(subsections)
elif header == 'false positives':
data['indicator']['false positives'].append(subsections)
elif header == 'level':
data['indicator']['level'].append(subsections)
json.dump(data, definition, indent=4)
I'd like for each of my documents to be entered into my json doc as individual indicators, with their detection, fields, dalspositives, and levels all grouped together -- but my python abilities are failing me.
Any insight I could get on this would be greatly appreciated!
You can get the output you want by iterating over .load_all()
and a much smaller program:
import sys
import ruamel.yaml
import json
yaml = ruamel.yaml.YAML(typ='safe')
ind = dict()
data = dict(indicator=ind)
for d in yaml.load_all(open('input.yaml')):
for k in ('detection', 'fields', 'falsepositives', 'level'):
ind.setdefault(k, []).append(d[k])
json.dump(data, sys.stdout, indent=2)
If you have a file input.yaml
:
---
title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
- https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage:
- '*\mshta.exe'
- '*\powershell.exe'
- '*\cmd.exe'
- '*\rundll32.exe'
- '*\cscript.exe'
- '*\wscript.exe'
- '*\wmiprvse.exe'
Image:
- '*\schtasks.exe'
- '*\nslookup.exe'
- '*\certutil.exe'
- '*\bitsadmin.exe'
- '*\mshta.exe'
condition: selection
fields:
- CommandLine
- ParentCommandLine
falsepositives:
- Administrative scripts
level: medium
...
---
title: Bash starting just what is asked
status: stabel
description: No negative side effects
references:
- https://nblue24.github.io/posts/2019/04/01/DownloadBed.html
author: Axel Roth
date: 2019/04/01
logsource:
product: linux
service: good
detection:
selection:
EventID: 42
ParentImage:
- '*/bash'
- '*/ash'
Image:
- systemctl
- init
condition: selection
fields:
- Shell
- ParentShell
falsepositives:
- root programs
level: high
...
Your output will be:
{
"indicator": {
"detection": [
{
"selection": {
"EventID": 1,
"ParentImage": [
"*\\mshta.exe",
"*\\powershell.exe",
"*\\cmd.exe",
"*\\rundll32.exe",
"*\\cscript.exe",
"*\\wscript.exe",
"*\\wmiprvse.exe"
],
"Image": [
"*\\schtasks.exe",
"*\\nslookup.exe",
"*\\certutil.exe",
"*\\bitsadmin.exe",
"*\\mshta.exe"
]
},
"condition": "selection"
},
{
"selection": {
"EventID": 42,
"ParentImage": [
"*/bash",
"*/ash"
],
"Image": [
"systemctl",
"init"
]
},
"condition": "selection"
}
],
"fields": [
[
"CommandLine",
"ParentCommandLine"
],
[
"Shell",
"ParentShell"
]
],
"falsepositives": [
[
"Administrative scripts"
],
[
"root programs"
]
],
"level": [
"medium",
"high"
]
}
}
This works on both Python 2 and 3.