I am trying to make a program that loops through a list of headlines
and remove items that have similar headlines in the rest of the list.
# Loop through headlines and remove over 50% similar ones
headlines = listHeadlines()
# headlines.append('Our plan is working says Hunt, as Bank raises interest rate to 5.25%')
print(len(headlines), headlines)
headlines_copy = list(headlines)
for headline in headlines_copy:
for h in headlines_copy:
if h == headline:
pass
elif areStringsSimilar(h, headline):
del headlines[headlines_copy.index(headline)]
break # Exit this loop and move back to other because headline has been deleted from list.
print(len(headlines), headlines)
The first print(len(headlines), headlines)
works and prints 1248 [list]
but then I get the error:
Traceback (most recent call last):
File "/Users/[path]/main.py", line 95, in <module>
del headlines[headlines_copy.index(headline)]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: list assignment index out of range
Process finished with exit code 1
Why not append the headlines you want to keep rather than deleting the headlines you don't want:
headlines = listHeadlines()
deduplicated = []
for candidate in headlines:
if not any(map(lambda kept: areStringsSimilar(candidate, kept), deduplicated)):
deduplicated.append(candidate)
print(deduplicated)