Search code examples
pythonscrapylist-comprehensionstrip

List Comprehension: Elegantly strip and remove empty elements in list


I am using the Scrapy lib. I frequently get lists with '\t' and '\n'.

I'm trying to use list comprehensions to strip and remove resulting empty elements, but end up with the empty elements still.

Could someone explain how the interpreter is processing the code? It's seems that it is checking for empty elements, THEN stripping and re-inserting elements into the list.

Thank you in advance!

# input
char_list = ['', '    a','b', '\t']
print char_list
char_list = [x.strip() for x in char_list if x!='']
print char_list

# output
['', '    a', 'b', '\t']
['a', 'b', '']

#DESIRED output
['', '    a', 'b', '\t']
['a', 'b']

Solution

  • Usually in this situation, I'll change it into 2 steps... In the first step, I do the potentially expensive processing. In the second step, I do the filtering. The first step can be done with a generator expression to avoid unnecessary lists:

    char_list_stripped = (x.strip() for x in char_list)
    char_list = [x for x in char_list_stripped if x]
    

    In this case, it saves you from calling x.strip twice as many times as you actually need to (if you were to pack it all into a single comprehension). That's probably not huge savings (you'll likely not notice the speed difference). But in the more general case, it could make a significant difference depending on how much work the processing actually entails.