I have a few lines of text and want to remove any word with special characters or a fixed given string in them (in python).
Example:
in_lines = ['this is go:od',
'that example is bad',
'amp is a word']
# remove any word with {'amp', ':'}
out_lines = ['this is',
'that is bad',
'is a word']
I know how to remove words from a list that is given but cannot remove words with special characters or few letters being present. Please let me know and I'll add more information.
This is what I have for removing selected words:
def remove_stop_words(lines):
stop_words = ['am', 'is', 'are']
results = []
for text in lines:
tmp = text.split(' ')
for stop_word in stop_words:
for x in range(0, len(tmp)):
if tmp[x] == stop_word:
tmp[x] = ''
results.append(" ".join(tmp))
return results
out_lines = remove_stop_words(in_lines)
in_lines = ['this is go:od',
'that example is bad',
'amp is a word']
def remove_words(in_list, bad_list):
out_list = []
for line in in_list:
words = ' '.join([word for word in line.split() if not any([phrase in word for phrase in bad_list]) ])
out_list.append(words)
return out_list
out_lines = remove_words(in_lines, ['amp', ':'])
print (out_lines)
Strange as it sounds, the statement
word for word in line.split() if not any([phrase in word for phrase in bad_list])
does all the hard work here at once. It creates a list of True
/False
values for each phrase in the "bad" list applied to a single word. The any
function condenses this temporary list into a single True
/False
value again, and if this is False
then the word can safely be copied into the line-based output list.
As an example, the result of removing all words containing an a
looks like this:
remove_words(in_lines, ['a'])
>>> ['this is go:od', 'is', 'is word']
(It is possible to remove the for line in ..
line as well. At that point, readability really starts to suffer, though.)