I have a thousand + files with remarks in HTML format. Some of them have spaces at the front, some have extra spaces inbetween words and there is a specific remark that is often found that I want to exclude.
I have created a function to strip the html tags (strip_tags()). This accomplishes what I want:
stripped_remarks = [" ".join(strip_tags(rem).split()) for rem in remarks] #removes extra spaces and html tags
stripped_remarks = [rem for rem in remarks if rem != r'garbage text ***'] #removes the garbage remark from the list
I can make this one line by changing the "if rem" part so it strips the spaces and html tags like it does before "for", but that seems to do the work twice when it's not necessary. Is it possible to do something like this?
stripped_remarks = [" ".join(strip_tags(rem).split()) as strip_rem for rem in remarks if split_rem != r'garbage text ***']
By defining strip_rem within the comprehension and reusing it for my conditional, I could easily make this one line without stripping the extra spaces or html tags twice. But is it possible?
Using the 'walrus operator' introduced in Python 3.8, this should work:
stripped_remarks = [strip_rem for rem in remarks if (strip_rem := " ".join(strip_tags(rem).split())) != r'garbage text ***']