I am trying to exclude question marks and colons from my results in Python however they keep showing up in the final output. The results are filtering by 'None' but not by punctuation.
Any help would be appreciated.
#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()
for i in tags:
if i.string is not None:
if i.string != ":":
if i.string != "?":
headlines.append(i.string)
You are comparing the whole string against the chars, but wanna know if the string contains a char - If you really wanna do it that way just use not in
to do the job:
if ':' not in i.string:
if '?' not in i.string:
Problem with your method is, that you will skip results. Think it would be much better to clean the results in the loop and replace such characters:
for i in tags:
print(i.string.replace(':', '').replace(':',''))
There is maybe a better way with regex if you wanna clean more characters.
Example
import requests
from bs4 import BeautifulSoup
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()
for i in tags:
if i.string is not None:
if ':' not in i.string:
if '?' not in i.string:
headlines.append(i.string)
headlines