An API endpoint that I am using for a project returns a plain text response of the form:
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
And I am attempting to parse this into a dictionary using Python. Currently, I have the following code:
import re
def text_to_dict(text):
js = {}
for s in text.splitlines():
x = s.split("=", maxsplit=1)
if len(x) > 1:
keys = [k for i in re.split("\]|\[", x[0]) if (k := i.strip())]
for i, k in enumerate(keys):
pd = js
for j,pk in enumerate(keys[:i]):
if keys[j+1:j+2] and not (keys[j+1:j+2][0]).isnumeric():
pd = pd[pk]
if k not in pd:
if k.isnumeric():
pd[keys[i-1]].append((x[1]).strip())
else:
pd[k] = (x[1]).strip() if i == len(keys)-1 else [] if keys[i+1:i+2] and (keys[i+1:i+2][0]).isnumeric() else {}
return js
This code can handle the above example, and it returns:
{
"code": "200",
"description": "Command completed successfully",
"runtime": "0.081",
"queuetime": "0",
"property": {
"abuse policy": [
"The policies are published at the REGISTRY_OPERATOR website at:",
"=>https://registry.in/Policies",
"",
"IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf",
"IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"abuse policy url": [
"https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"active": [
"0"
]
}
}
However, it cannot handle the following if I append it to the example above:
...
property[active][1][test] = TEST
or
...
property[active][1][0] = TEST
which should return
{
...
"active": [
"0",
{"test": "TEST"}
]
}
and
{
...
"active": [
"0",
["TEST"]
]
}
respectively.
I feel like there is an easier way of accounting for all possibilities without writing a bunch of nested ifs, but I'm not sure what is is.
Your input data is practically in INI file format. Python has the configparser
module for convenience.
When we presume that every part of the key 'property[foo][0][test]'
actually is a dict key (no nested lists), we would parse that into this structure:
{'property': {'foo': {'0': {'test': 'value'}}}}
which can be done with a loop that keeps creating nested dicts:
from configparser import ConfigParser
def parse(text):
config = ConfigParser()
config.read_string(text)
root = {}
for key in config['RESPONSE'].keys():
curr = root
for key_part in key.replace(']', '').split('['):
if key_part not in curr:
curr[key_part] = {}
prev = curr
curr = curr[key_part]
prev[key_part] = config['RESPONSE'][key]
return root
usage
from pprint import pprint
text = """
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
property[foo][0][test] = a
property[foo][1][test] = b
property[bar][0][0] = A
property[bar][1][1] = B
"""
pprint(parse(text))
result
{'code': '200',
'description': 'Command completed successfully',
'property': {'abuse policy': {'0': 'The policies are published at the '
'REGISTRY_OPERATOR website at:',
'1': '=>https://registry.in/Policies',
'2': '',
'3': 'IN Policy Framework: '
'https://registry.in/system/files/inpolicy_0.pdf',
'4': 'IN Domain Anti-Abuse policy: '
'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'abuse policy url': {'0': 'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'active': {'0': '0'},
'bar': {'0': {'0': 'A'}, '1': {'1': 'B'}},
'foo': {'0': {'test': 'a'}, '1': {'test': 'b'}}},
'queuetime': '0',
'runtime': '0.071'}
You could check if key_part
is numeric, and convert it to int
so the resulting structure behaves more like it contained lists, i.e.
{'property': {'foo': {0: {'test': 'value'}}}}