Search code examples
pythonlistfilterlambda

How can I filter out duplicate ids in a loop from list


I have such a list at hand. In this list, I want to filter the deposits under each witdrawal by removing the same ones from another list. This cluster is currently clustered over 2 withdrawals, but this may vary. Therefore, as much as a withdrawal cluster in one cycle, the deposit in one withdrawal should not be in another withdrawal cluster. For this, I tried various lambda functions over deposit id, but I could not get the desired output. How can I provide this?

exampleList = [
    {
      "withdrawal": {
        "amount": 250,
        "id": 70916631583,
        "date": "31-05-22 - 16:14:08",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    },
    {
      "withdrawal": {
        "amount": 220,
        "id": 71019072820,
        "date": "01-06-22 - 14:28:40",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71033338591,
          "amount": 100,
          "date": "01-06-22 - 17:03:19",
          "paytype": "deposit"
        },
        {
          "id": 71033144597,
          "amount": 250,
          "date": "01-06-22 - 17:01:20",
          "paytype": "deposit"
        },
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    }
  ]

Example Output:

exampleOutputList = [
    {
      "withdrawal": {
        "amount": 250,
        "id": 70916631583,
        "date": "31-05-22 - 16:14:08",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    },
    {
      "withdrawal": {
        "amount": 220,
        "id": 71019072820,
        "date": "01-06-22 - 14:28:40",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71033338591,
          "amount": 100,
          "date": "01-06-22 - 17:03:19",
          "paytype": "deposit"
        },
        {
          "id": 71033144597,
          "amount": 250,
          "date": "01-06-22 - 17:01:20",
          "paytype": "deposit"
        }
        
      ]
    }
  ]

The deposits with id 71018974368 and 71018971332 that I show in the sample printout are not available in the next one as they were in the previous withdrawal cluster. This is exactly what I wanted to do. This withdrawal clustering can be more than 2, so it can vary, so doing this by indexing the elements will not solve my problem.

I tried something like this. I waited for it to resend the ids into an empty list and filter through the loop, but the output I got did not change.

listLen = len(exampleList)
testList = []
if(listLen > 0):
    while listLen > 0:
        listLen -= 1
        deposits = exampleList[listLen]['deposit']
        withDrawal = exampleList[listLen]['withdrawal']
        idList = [x['id'] for x in deposits]
        filterFromList = list(filter(lambda x:x['id'] not in testList, deposits))
        testList.append({"withdrawal" : withDrawal,"deposit" : filterFromList})
        
    print(testList)

Output

[{'withdrawal': {'amount': 220, 'id': 71019072820, 'date': '01-06-22 - 14:28:40', 'paytype': 'withdrawal'}, 'deposit': [{'id': 71033338591, 'amount': 100, 'date': '01-06-22 - 17:03:19', 'paytype': 'deposit'}, {'id': 71033144597, 'amount': 250, 'date': '01-06-22 - 17:01:20', 'paytype': 'deposit'}, {'id': 71018974368, 'amount': 120, 'date': '01-06-22 - 14:27:26', 'paytype': 'deposit'}, {'id': 71018971332, 'amount': 100, 'date': '01-06-22 - 14:27:23', 'paytype': 'deposit'}]}, {'withdrawal': {'amount': 250, 'id': 70916631583, 'date': '31-05-22 - 16:14:08', 'paytype': 'withdrawal'}, 'deposit': [{'id': 71018974368, 'amount': 120, 'date': '01-06-22 - 14:27:26', 'paytype': 'deposit'}, {'id': 71018971332, 'amount': 100, 'date': '01-06-22 - 14:27:23', 'paytype': 'deposit'}]}]

There are repetitive deposit ids and elements as seen in the output.


Solution

  • You could keep a set of already-seen ids as you traverse the data. For each cluster, keep a side list of ids not seen and replace the the "deposit" list before advancing to the next cluster. This is a lot easier than trying to track indexes of the nested collections.

    seen = set()
    
    for cluster in exampleList:
        filtered = []
        for deposit in cluster["deposit"]:
            if deposit["id"] not in seen:
                seen.add(deposit["id"])
                filtered.append(deposit)
        cluster["deposit"][:] = filtered