Search code examples
pythonjsonduplicates

append() function does not work when trying to remove duplicates in json file - Python


I have one json file which has some duplicates based on a column called userid, and want to remove those duplicates using append() function so that it can output a new file with the same format as the original one.

Here is a json file:

    [
        {
            "userid": "7126521576",
            "status": "UserStatus.OFFLINE",
            "name": "Avril Pauling",
            "bot": false,
            "username": "None"
        },
      {
            "userid": "7126521576",
            "status": "UserStatus.OFFLINE",
            "name": "Avril Pauling",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "6571627119",
            "status": "UserStatus.OFFLINE",
            "name": "Laverne Alferez",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "1995422560",
            "status": "UserStatus.OFFLINE",
            "name": "098767800",
            "bot": false,
            "username": "None"
        }
    ]

The output file after removing duplicated userids should be:

    [
        {
            "userid": "7126521576",
            "status": "UserStatus.OFFLINE",
            "name": "Avril Pauling",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "6571627119",
            "status": "UserStatus.OFFLINE",
            "name": "Laverne Alferez",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "1995422560",
            "status": "UserStatus.OFFLINE",
            "name": "098767800",
            "bot": false,
            "username": "None"
        }
    ]

I have tried the following codes, append() function appeas to not working correctly; it only append the last item:

    import json
    with open('target_user.json', 'r', encoding='utf-8') as f:
        jsons = json.load(f)

    jsons2 = []
    for item in jsons:
        if item['userid'] not in json2:
            jsons2.append(item)
            
    with open('target_user2.json', 'w', encoding='utf-8') as nf:
        json.dump(jsons2, nf, indent=4)

A quick help is very appreciated.


Solution

  • This should do what you need:

    import json
    with open('target_user.json', 'r', encoding='utf-8') as f:
        jsons = json.load(f)
    
    ids = set()
    jsons2 = []
    for item in jsons:
        if item['userid'] not in ids:
            ids.add(item['userid'])
            jsons2.append(item)
            
    with open('target_user2.json', 'w', encoding='utf-8') as nf:
        json.dump(jsons2, nf, indent=4)