Search code examples
pythonbeautifulsouppython-3.7

BeautifulSoup - Cant filter list results by punctuation


I am trying to exclude question marks and colons from my results in Python however they keep showing up in the final output. The results are filtering by 'None' but not by punctuation.

Any help would be appreciated.

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if i.string != ":":
            if i.string != "?":
                headlines.append(i.string)

Solution

  • You are comparing the whole string against the chars, but wanna know if the string contains a char - If you really wanna do it that way just use not in to do the job:

    if ':' not in i.string:
        if '?' not in i.string:
    

    Problem with your method is, that you will skip results. Think it would be much better to clean the results in the loop and replace such characters:

    for i in tags:
        print(i.string.replace(':', '').replace(':',''))
    

    There is maybe a better way with regex if you wanna clean more characters.

    Example

    import requests
    from bs4 import BeautifulSoup
    url = 'https://www.bbc.co.uk/news'
    res = requests.get(url)
    html_page = res.content
    soup = BeautifulSoup(html_page, 'html.parser')
    
    tags = soup.find_all(class_='gs-c-promo-heading__title')
    #print(headlines)
    headlines = list()
    
    for i in tags:
        if i.string is not None:
            if ':' not in i.string:
                if '?' not in i.string:
                    headlines.append(i.string)
    headlines