Search code examples
pythonarraysordereddictionary

merge complex list of nested dicts


I'm trying to merge nested Dicts in a list based on "name" like the following:

[
  {
    "name": "abc",
    "metadata": [
        {
            "name": "foo",
            "data": [
                {
                    "version": "1.0"
                }
            ]
        },
        {
            "name": "foo",
            "data": [
                {
                    "version": "2.0"
                }
            ]
        },
        {
            "name": "bar",
            "data": [
                {
                    "version": "1.0"
                }
            ]
        }
    ]
},
{
    "name": "xyz",
    "metadata": [
        {
            "name": "bob",
            "data": [
                {
                    "version": "3.2"
                }
            ]
        },
        {
            "name": "alice",
            "data": [
                {
                    "version": "2.2"
                }
            ]
        }
    ]
},
{
    "name": "xyz",
    "metadata": [
        {
            "name": "mike",
            "data": [
                {
                    "version": "3.2"
                }
            ]
        },
        {
            "name": "alice",
            "data": [
                {
                    "version": "2.2"
                }
            ]
        }
      ]
  }
]

Considering that the merged items should not have duplicates in the metadata, how can I do that in Python? Metadata entries should be unique, if name+data+version exist in the metadata, then the item should not be merged.

my desired output should look like this

[
  {
    "name": "abc",
    "metadata": [
        {
            "name": "foo",
            "data": [
                {
                    "version": "1.0"
                }
            ]
        },
        {
            "name": "foo",
            "data": [
                {
                    "version": "2.0"
                }
            ]
        },
        {
            "name": "bar",
            "data": [
                {
                    "version": "1.0"
                }
            ]
        }
    ]
},
{
    "name": "xyz",
    "metadata": [
        {
            "name": "bob",
            "data": [
                {
                    "version": "3.2"
                }
            ]
        },
        {
            "name": "mike",
            "data": [
                {
                    "version": "3.2"
                }
            ]
        },
        {
            "name": "alice",
            "data": [
                {
                    "version": "2.2"
                }
            ]
        }
    ]
   }
]

Solution

  • You can use itertools.groubpy:

    import itertools
    d = [{'name': 'abc', 'metadata': [{'name': 'foo', 'data': [{'version': '1.0'}]}, {'name': 'foo', 'data': [{'version': '2.0'}]}, {'name': 'bar', 'data': [{'version': '1.0'}]}]}, {'name': 'xyz', 'metadata': [{'name': 'bob', 'data': [{'version': '3.2'}]}, {'name': 'alice', 'data': [{'version': '2.2'}]}]}, {'name': 'xyz', 'metadata': [{'name': 'mike', 'data': [{'version': '3.2'}]}, {'name': 'alice', 'data': [{'version': '2.2'}]}]}]
    new_d = [[a, list(b)] for a, b in itertools.groupby(sorted(d, key=lambda x:x['name']), key=lambda x:x['name'])]
    result = [{'name':a, 'metadata':[c for j in b for c in j['metadata']]} for a, b in new_d]
    final_result = [{**i, 'metadata':[c for d, c in enumerate(i['metadata']) if all(a != c for a in i['metadata'][:d])]} for i in result]
    

    import json
    print(json.dumps(final_result, indent=4))
    

    Output:

    [
      {
        "name": "abc",
        "metadata": [
            {
                "name": "foo",
                "data": [
                    {
                        "version": "1.0"
                    }
                ]
            },
            {
                "name": "foo",
                "data": [
                    {
                        "version": "2.0"
                    }
                ]
            },
            {
                "name": "bar",
                "data": [
                    {
                        "version": "1.0"
                    }
                ]
            }
        ]
    },
    {
        "name": "xyz",
        "metadata": [
            {
                "name": "bob",
                "data": [
                    {
                        "version": "3.2"
                    }
                ]
            },
            {
                "name": "alice",
                "data": [
                    {
                        "version": "2.2"
                    }
                ]
            },
            {
                "name": "mike",
                "data": [
                    {
                        "version": "3.2"
                    }
                ]
             }
          ]
       }
    ]