I have a piece of code using beautifoulsoup to scrape some specific urls from a web page and have them stored in a list, I try to filter the None values once and for all, I have used the following alternatives:
1
list_links = [link.get('data-href') for link in BSOBJ.find_all('a') if link is not None]
2
list_links = [link.get('data-href') for link in BSOBJ.find_all('a') if link != None]
In both of them I still get the None values, after the list is created I delete them with this line:
list_links = list(filter(None, list_links))
But I would like to know why I can't filter them with the previous codes and if there is a way to do it directly using list comprehension.
The problem seems to be that link.get('data-href
) sometimes returns None
. In order to catch these cases, use
list_links = [link.get('data-href') for link in BSOBJ.find_all('a') if link.get('data-href') is not None]
and there should be no more None
s in your list. If link
itself can be None
, you should of course keep filtering for this as well.