I have metadata extracted from an SEM acquisition that is structured as three separate dictionaries: acquisitionMetadata
, datasetMetadata
, and imageMetadata
. Each dictionary contains key-value pairs, where the keys are dot-separated strings representing the hierarchical levels.
acquisitionMetadata
is simply a dictionary as described above.
datasetMetadata
is a list of dictionaries of the same structure, where each dictionary represents the metadata for a specific dataset within the acquisition. imageMetadata
is also a list of dictionaries, where each element in the list corresponds to a dataset and contains another list of dictionaries representing the metadata for each image within that dataset.
I need to combine these three dictionaries into one nested dictionary in Python (and eventually a JSON file), where the keys represent the hierarchy levels. For example, 'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07'
means that I want a value of '18.08.2020 17:51:07'
to be stored under acquisition{dataset{images{creationTime:18.08.2020 17:51:07}}}.
The main issue I'm having arises when we get to the lists and the nested structure. I can't get it to dynamically build the arrays under "dataset" and "images" in the way that I want it to, either it repeats the "acquisition", "dataset", and/or "image" keys when it is already under them, or it places the image dictionaries outside of the array of datasets. The chatbot has gotten me close, but no matter how I describe the issue it can't get it right. It also insists on hardcoding the level names/keys, and I don't want that obviously.
For reference, the combined dictionary (and output JSON) should have the following structure (obviously not each key/variable is shown) when created with the variables in my minimal working example :
metadata = {
'acquisition': {
'genericMetadata': {
'program': {
'programName': 'Auto Slice & View 4',
'programVersion': '4.2.1.1982'
},
'applicationId': {
'identifierValue': 'ASV'
},
'fileVersion': '1.2',
'projectName': '20200818_AlSi13 XRM tomo2',
'numberOfCuts': '719'
},
'dataset': [
{
'rows': '1',
'columns': '1',
'images': [
{
'creationTime': '18.08.2020 17:51:07',
'stage': {
'workingDistance': {
'value': '0.00403678'
}
}
},
{
'creationTime': '18.08.2020 18:09:06',
'stage': {
'workingDistance': {
'value': '0.00403773'
}
}
}
]
},
{
'rows': '1',
'columns': '1',
'images': [
{
'creationTime': '18.08.2020 17:51:07',
'stage': {
'workingDistance': {
'value': '0.00403678'
}
}
},
{
'creationTime': '18.08.2020 18:09:06',
'stage': {
'workingDistance': {
'value': '0.00403773'
}
}
}
]
}
]
}
}
Here is a minimal working example of what the dictionaries look like that I am inputting into such a function. You can copy and paste this into your IDE to recreate the inputs I'm working with.
acquisition_metadata = {
'acquisition.genericMetadata.program.programName': 'Auto Slice & View 4',
'acquisition.genericMetadata.program.programVersion': '4.2.1.1982',
'acquisition.genericMetadata.applicationId.identifierValue': 'ASV',
'acquisition.genericMetadata.fileVersion': '1.2',
'acquisition.genericMetadata.projectName': '20200818_AlSi13 XRM tomo2',
'acquisition.genericMetadata.numberOfCuts': '719',
}
dataset_metadata = [
{
'acquisition.dataset.rows': '1',
'acquisition.dataset.columns': '1',
},
{
'acquisition.dataset.rows': '1',
'acquisition.dataset.columns': '1',
},
]
image_metadata = [
[
{
'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07',
'acquisition.dataset.images.stage.workingDistance.value': '0.00403678',
},
{
'acquisition.dataset.images.creationTime': '18.08.2020 18:09:06',
'acquisition.dataset.images.stage.workingDistance.value': '0.00403773',
}
],
[
{
'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07',
'acquisition.dataset.images.stage.workingDistance.value': '0.00403678',
},
{
'acquisition.dataset.images.creationTime': '18.08.2020 18:09:06',
'acquisition.dataset.images.stage.workingDistance.value': '0.00403773',
}
]
]
Here is what I have tried (with the help of our friend "Gee Pee Tee"):
import json
import os
def combine_metadata(acquisition_metadata, dataset_metadata, image_metadata):
metadata = {}
# Combine acquisition metadata
for key, value in acquisition_metadata.items():
nested_keys = key.split('.')
current_dict = metadata
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
# Combine dataset metadata
metadata['acquisition']['dataset'] = []
for dataset in dataset_metadata:
dataset_dict = {}
for key, value in dataset.items():
nested_keys = key.split('.')
current_dict = dataset_dict
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
metadata['acquisition']['dataset'].append(dataset_dict)
# Combine image metadata
for i, images in enumerate(image_metadata):
metadata['acquisition']['dataset'][i]['images'] = []
for image in images:
image_dict = {}
for key, value in image.items():
nested_keys = key.split('.')
current_dict = image_dict
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
metadata['acquisition']['dataset'][i]['images'].append(image_dict)
return metadata
def save_metadata_as_json(metadata, save_path):
filename = os.path.join(save_path, "combined.json")
with open(filename, 'w') as file:
json.dump(metadata, file, indent=4)
print(f"Metadata saved as {filename}")
But it produces this output:
{
"acquisition": {
"genericMetadata": {
"program": {
"programName": "Auto Slice & View 4",
"programVersion": "4.2.1.1982"
},
"applicationId": {
"identifierValue": "ASV"
},
"fileVersion": "1.2",
"projectName": "20200818_AlSi13 XRM tomo2",
"numberOfCuts": "719"
},
"dataset": [
{
"acquisition": {
"dataset": {
"rows": "1",
"columns": "1"
}
},
"images": [
{
"acquisition": {
"dataset": {
"images": {
"creationTime": "18.08.2020 17:51:07",
"stage": {
"workingDistance": {
"value": "0.00403678"
}
}
}
}
}
},
{
"acquisition": {
"dataset": {
"images": {
"creationTime": "18.08.2020 18:09:06",
"stage": {
"workingDistance": {
"value": "0.00403773"
}
}
}
}
}
}
]
},
{
"acquisition": {
"dataset": {
"rows": "1",
"columns": "1"
}
},
"images": [
{
"acquisition": {
"dataset": {
"images": {
"creationTime": "18.08.2020 17:51:07",
"stage": {
"workingDistance": {
"value": "0.00403678"
}
}
}
}
}
},
{
"acquisition": {
"dataset": {
"images": {
"creationTime": "18.08.2020 18:09:06",
"stage": {
"workingDistance": {
"value": "0.00403773"
}
}
}
}
}
}
]
}
]
}
}
where you can see the redundant level names I was talking about...
In short, I need the above dictionaries to be inputted into a function which create a nested dictionary structured like shown above. I am eventually outputting this combined dictionary as a JSON file, so if it's easier to go directly to the JSON output, then I'll take that as well.
you almost had it. note the nested_keys.remove(...)
# Combine acquisition metadata
for key, value in acquisition_metadata.items():
nested_keys = key.split('.')
current_dict = metadata
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
# Combine dataset metadata
metadata['acquisition']['dataset'] = []
for dataset in dataset_metadata:
dataset_dict = {}
for key, value in dataset.items():
nested_keys = key.split('.')
nested_keys.remove('acquisition')
nested_keys.remove('dataset')
current_dict = dataset_dict
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
metadata['acquisition']['dataset'].append(dataset_dict)
# Combine image metadata
for i, images in enumerate(image_metadata):
metadata['acquisition']['dataset'][i]['images'] = []
for image in images:
image_dict = {}
for key, value in image.items():
nested_keys = key.split('.')
nested_keys.remove('acquisition')
nested_keys.remove('dataset')
nested_keys.remove('images')
current_dict = image_dict
for nested_key in nested_keys[:-1]:
if nested_key not in current_dict:
current_dict[nested_key] = {}
current_dict = current_dict[nested_key]
current_dict[nested_keys[-1]] = value
metadata['acquisition']['dataset'][i]['images'].append(image_dict)