I have following sample dataset as a pandas DataFrame, each key
is unique to its namespace
and each value
is also unique to its key
:
image | namespace | key | value | |
---|---|---|---|---|
0 | img1 | ns1 | organism | human |
1 | img1 | ns1 | organ | liver |
2 | img1 | ns2 | microscope | confocal |
3 | img2 | ns2 | microscope | confocal |
4 | img2 | ns2 | technique | widefield |
5 | img2 | ns2 | technique | phase-contrast |
6 | img2 | ns4 | analysis | segmentation |
and I try to get a dict of dicts of dicts of list out of it. The ideal outcome would look like:
{"img1":{"ns1":{"organism":["human"],"organ":["liver"]},
"ns2":{"microscope":["confocal"]}},
"img2":{"ns2":{"microscope":["confocal"],"technique":["widefield","phase-contrast"]},
"ns4":{"analysis":["segmentation"]}}}
I am sure I can somehow achieve this by recursive .groupby
but I have tried and failed multiple times.
Please, can someone more competent point out an obvious answer?
Not being a Pandas wizard, I would simply iterate over the rows using setdefault()
to build your nested dictionary. In fact, I might be tempted to bypass pandas altogether.
import pandas
df = pandas.DataFrame([
{"image": "img1", "namespace": "ns1", "key": "organism", "value": "human"},
{"image": "img1", "namespace": "ns1", "key": "organ", "value": "liver"},
{"image": "img1", "namespace": "ns2", "key": "microscope", "value": "confocal"},
{"image": "img2", "namespace": "ns2", "key": "microscope", "value": "confocal"},
{"image": "img2", "namespace": "ns2", "key": "technique", "value": "widefield"},
{"image": "img2", "namespace": "ns2", "key": "technique", "value": "phase-contrast"},
{"image": "img2", "namespace": "ns4", "key": "analysis", "value": "segmentation"},
])
results = {}
for _, row in df.iterrows():
results \
.setdefault(row["image"], {}) \
.setdefault(row["namespace"], {}) \
.setdefault(row["key"], []) \
.append(row["value"])
import json
print(json.dumps(results, indent=4))
That will give you:
{
"img1": {
"ns1": {
"organism": [
"human"
],
"organ": [
"liver"
]
},
"ns2": {
"microscope": [
"confocal"
]
}
},
"img2": {
"ns2": {
"microscope": [
"confocal"
],
"technique": [
"widefield",
"phase-contrast"
]
},
"ns4": {
"analysis": [
"segmentation"
]
}
}
}