My python script parses titles and links from multiple RSS feeds. I store those Titles in a list and I want to make sure I never print duplicates. How do I do that?
#!/usr/bin/python
from twitter import *
from goose import Goose
import feedparser
import time
from pyshorteners import Shortener
import pause
import newspaper
dr = feedparser.parse("http://www.darkreading.com/rss_simple.asp")
sm =feedparser.parse("http://www.securitymagazine.com/rss/topic/2654-cyber-tactics.rss")
dr_posts =["CISO Playbook: Games of War & Cyber Defenses",
"SWIFT Confirms Cyber Heist At Second Bank; Researchers Tie Malware Code to Sony Hack","The 10 Worst Vulnerabilities of The Last 10 Years",
"GhostShell Leaks Data From 32 Sites In 'Light Hacktivism' Campaign",
"OPM Breach: 'Cyber Sprint' Response More Like A Marathon",
"Survey: Customers Lose Trust In Brands After A Data Breach",
"Domain Abuse Sinks 'Anchors Of Trust'",
"The 10 Worst Vulnerabilities of The Last 10 Years",
]
sm_posts = ["10 Steps to Building a Better Cybersecurity Plan"]
x = 1
while True:
try:
drtitle = dr.entries[x]["title"]
drlink = dr.entries[x]["link"]
if drtitle in dr_posts:
x += 1
drtitle = dr.entries[x]["title"]
drtitle = dr.entries[x]["link"]
print drtitle + "\n" + drlink
dr_posts.append(drtitle)
x -= 1
pause.seconds(10)
else:
print drtitle + "\n" + drlink
dr_posts.append(drtitle)
pause.seconds(10)
smtitle = sm.entries[x]["title"]
smlink = sm.entries[x]["link"]
if smtitle in sm_posts:
x +=1
smtitle = sm.entries[x]["title"]
smtitle = sm.entries[x]["title"]
print smtitle + "\n" + smlink
sm_posts.append(smtitle)
pause.seconds(10)
else:
print smtitle + "\n" + smlink
sm_posts.append(smtitle)
x+=1
pause.seconds(10)
except IndexError:
print "FAILURE"
break
For the time being I only have it skipping entries. Which would be a problem because if there's another duplicate further down the line in the RSS feed, then I'll have even more duplicates.
You can leverage the data structure set, as its property of "uniqueness" will do the work for you. Essentially we can make your list a set and then the set a list again, which ensures your list is now populated with strictly unique values.
If you had a list l, then you could make it unique by
l = list(set(l))