Search code examples
pythonlistpython-2.7trim

Remove items from list not in 'speeches'?


url = 'http://www.millercenter.org/president/speeches'

conn = urllib2.urlopen(url)
html = conn.read()

miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')

linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
linklist = str(linklist)

end_of_links = [line for line in linklist if '/events/' in line]
print end_of_links

This is a tiny snippet of my output (saved in a Python list).

['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america', 
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', president/obama/speeches/speech-4430', ...]

I want to delete all items in the list that do not contain speeches. I've tried filter() and just creating another list comprehension, but that hasn't yet worked. I don't know why the end_of_links variable is not working - it seems intuitive to me, at least.


Solution

  • li = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america', '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama', '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']

    import re

    li = [ x for x in li if re.search('speeches',x)]

    print(li)

    ['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']