Search code examples
pythonstringlistdictionarytokenize

How to delete words that longer than a certain length in a list of dictionaries


I have a list of dictionaries like this:

myList = [
    {
        'id':1,
        'text':['I like cheese.', 
                'I love cheese.', 'oh Ilikecheese !'],
        'text_2': [('david',
    'david',
    'I do not like cheese.'),
   ('david',
    'david',
    'cheese is good.')]    
    },
    {
        'id':2,
        'text':['I like strawberry.', 'I love strawberry'],
        'text_2':[('alice',
    'alice',
    'strawberry is good.'),
   ('alice',
    'alice',
    ' strawberry is so so.')]    
    }
]

I want to delete the words that longer than a certain number of letters (e.g. 9 letters).

The ideal output is the same list of dictionaries but delete the misspelled words such as removing "Ilikecheese":

myList = [
    {
        'id':1,
        'text':['I like cheese.', 
                'I love cheese.', 'oh!'],
        'text_2': [('david',
    'david',
    'I do not like cheese.'),
   ('david',
    'david',
    'cheese is good.')]    
    },
    {
        'id':2,
        'text':['I like strawberry.', 'I love strawberry'],
        'text_2':[('alice',
    'alice',
    'strawberry is good.'),
   ('alice',
    'alice',
    ' strawberry is so so.')]    
    }
]

Any suggestions?


Solution

  • Remove each words in a string which is longer or equal than 9. Criterium for splitting a string: single white-space.

    myList = # above
    
    for d in myList:
        for k, v in d.items():
            if isinstance(v, list):
                for i, word in enumerate(v):
                    v[i] = ' '.join(list(filter(lambda w: len(w)<9, word.split(' '))))
    
    for d in myList: print(d)
    

    Output

    {'id': 1, 'text': ["I 'll tell you what . Next say ' Potts ' on the tower .", 'I assume . Light her up .', 'Cap , I need the lever !']}
    {'id': 2, 'text': ['Dr. Banner .', 'Stark , we need a plan of attack !', '( taken by that )', 'Everyone ! Clear out !', "Think the guy 's a friendly ?", 'Those people need .', 'Then suit up .']}
    

    If tuples instead of lists

    for d in myList:
        for k, v in d.items():
            if isinstance(v, tuple):
                v = list(v)
                for i, word in enumerate(v):               
                    v[i] = ' '.join([w for w in word.split(' ') if len(w) < 9])
                d[k] = tuple(v)
    
    for d in myList: print(d)