Search code examples
pythonjsondictionarydiffdeep-diff

How to compare list of objects and keep only _new_ objects?


I have two JSON files named new and old that files have some data. here I want to compare new.json with the old.json file while comparing if I have the same data in those two JSON files I don't want to create any new JSON file

If I have different data like below in new.json and old.json

new.json:

[
 {
    "name": "Mohan raj",
    "age": 23,
    "country": "INDIA"
 },
 {
    "name": "Kiruthika",
    "age": 18,
    "country": "INDIA"
 },
 {
    "name": "Munusamy",
    "age": 45,
    "country": "INDIA"
 },
 {
    "name": "John Wood",
    "age": 35,
    "country": "USA"
 },
 {
    "name": "Mark Smith",
    "age": 25,
    "country": "USA"
 }
]

old.json:

[
 {
    "name": "John Wood",
    "age": 35,
    "country": "USA"
 },
 {
    "name": "Mark Smith",
    "age": 30,
    "country": "USA"
 },
 {
    "name": "Oscar Bernard",
    "age": 25,
    "country": "Australia"
 }
]

If the new.json file has any of the same data of old.json having we have to skip that data and the new.json file have any of the updated data of old.json having and the new data's in new.json we have to create a new JSON file named updated.json with the data of the above scenarios.

The resulted JSON file needs to look like this:

updated.json:

[
 {
    "name": "Mohan raj",
    "age": 23,
    "country": "INDIA"
 },
 {
    "name": "Kiruthika",
    "age": 18,
    "country": "INDIA"
 },
 {
    "name": "Munusamy",
    "age": 45,
    "country": "INDIA"
 },
 {
    "name": "Mark Smith",
    "age": 25,
    "country": "USA"
 }
]

Solution

  • Took me a while to get, thanks for answering my questions, and it seems like "updated" might simply be expressed as "new not in old"?

    I think so, because the following seems to do the job.

    The key is to make comparisons of the objects themselves, and not wanting to get into object comparison (deep-equal), just hashing each object back to JSON gives us string representations we can compare:

    import json
    
    old_hashes = []
    
    old_objs = json.load(open('old.json'))
    for old_obj in old_objs:
        old_hash = json.dumps(old_obj)
        old_hashes.append(old_hash)
    
    
    # "Updated" means "new not in old"
    updated_objs = []
    
    new_objs = json.load(open('new.json'))
    for new_obj in new_objs:
        new_hash = json.dumps(new_obj)
        if new_hash not in old_hashes:
            updated_objs.append(new_obj)
    
    
    print(json.dumps(updated_objs, indent=2))
    

    When I run that against your old.json and new.json, I get:

    [
      {
        "name": "Mohan raj",
        "age": 23,
        "country": "INDIA"
      },
      {
        "name": "Kiruthika",
        "age": 18,
        "country": "INDIA"
      },
      {
        "name": "Munusamy",
        "age": 45,
        "country": "INDIA"
      },
      {
        "name": "Mark Smith",
        "age": 25,
        "country": "USA"
      }
    ]