Search code examples
jsonlistdictionarynlp

How to delete a whole element (index) out of a json dictionary on a certain condition


Currently, I have a dictionary with legal decisions, looking like this (in total 391 decisions).

data = [{'text': "ECLI:NL:GHLEE:2002:AL8039   Instantie  Gerechtshof Leeuwarden  Datum uitspraak  18-10-2002  Datum publicatie  08-10-2003  Zaaknummer   BK 866/98 Vennootschapsbelasting   Rechtsgebieden   Belastingrecht   Bijzondere kenmerken"},
{'text': "ECLI:NL:GHARL:2014:5893   Instantie  Gerechtshof Arnhem-Leeuwarden  Datum uitspraak  15-07-2014  Datum publicatie  01-08-2014  Zaaknummer   14/00030   Formele relaties  Eerste aanleg: ECLI:NL:RBGEL:2013:4925 , Bekrachtiging/bevestiging   Rechtsgebieden   Belastingrecht   Bijzondere kenmerken"},
{'text': "ECLI:NL:GHARL:2015:7518   Instantie  Gerechtshof Arnhem-Leeuwarden  Datum uitspraak  06-10-2015  Datum publicatie  16-10-2015  Zaaknummer   14/01259   Formele relaties  Eerste aanleg: ECLI:NL:RBGEL:2014:6894 , Bekrachtiging/bevestiging Cassatie: ECLI:NL:HR:2016:2736    **Rechtsgebieden   Strafrecht** Bijzondere kenmerken"}]

In this project I want to delete the elements where there is "Rechtsgebieden Strafrecht" in the string. Therefore, I have to loop over all the elements, and then delete the whole index number

{'text': "ECLI:NL:GHARL:2015:7518   Instantie  Gerechtshof Arnhem-Leeuwarden  Datum uitspraak  06-10-2015  Datum publicatie  16-10-2015  Zaaknummer   14/01259   Formele relaties  Eerste aanleg: ECLI:NL:RBGEL:2014:6894 , Bekrachtiging/bevestiging Cassatie: ECLI:NL:HR:2016:2736    **Rechtsgebieden   Strafrecht** Bijzondere kenmerken"}

I thought of something like this, but I can't seem to find a solution to get the right index number (as data[d] does not work of course):

    substring = "Rechtsgebieden   Strafrecht"
    for d in data:
        if substring in str(d):
            del data[d]

Solution

  • Please see following points:

    1. You should use d['text'] and not str(d) for your testing.
    2. You should not modify your data while you are iterating on that. See this question for more details on why.

    See this example of incorrect behavior

    >>> data
    [{'text': 'some text'}, {'text': 'text with value to be removed'}, {'text': 'text with value to be removed BUT WILL NOT BE REMOVED'}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>> VALUE_TO_REMOVE
    'value to be removed'
    >>> for i, item in enumerate(data):
    ...     if VALUE_TO_REMOVE in item['text']:
    ...             del data[i]
    ... 
    >>> data
    [{'text': 'some text'}, {'text': 'text with value to be removed BUT WILL NOT BE REMOVED'}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>> 
    
    1. Since you can not remove the entry from data during iteration, create a new data list where each dictionary satisfy your condition.

    Preferred correct manner:

    >>> data
    [{'text': 'some text'}, {'text': 'text with value to be removed'}, {'text': 'text with value to be removed BUT WILL NOT BE REMOVED'}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>> VALUE_TO_REMOVE
    'value to be removed'
    >>> new_data = [x for x in data if VALUE_TO_REMOVE not in x['text']]
    >>> new_data
    [{'text': 'some text'}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>> 
    
    1. In case, you data is huge and you do not want to make a copy, you can mark the individual dictionaries with key 'Do not use' and make use of that later while processing. See below:
    >>> data
    [{'text': 'some text'}, {'text': 'text with value to be removed'}, {'text': 'text with value to be removed BUT WILL NOT BE REMOVED'}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>> for item in data:
    ...     if VALUE_TO_REMOVE in item['text']:
    ...             item['DO NOT USE'] = True
    ... 
    >>> data
    [{'text': 'some text'}, {'text': 'text with value to be removed', 'DO NOT USE': True}, {'text': 'text with value to be removed BUT WILL NOT BE REMOVED', 'DO NOT USE': True}, {'text': 'some other text'}, {'text': 'this text is also fine'}]
    >>>