Search code examples
pythonstringlistconditional-statementsany

Check wether words from a list are inside a string of another list Python


So I tried getting all the headlines of the New York Times homepage and wanted to see how many times a certain word has been mentioned. In this particular case, I wanted to see how many headlines mentioned either the Coronavirus or Trump. This is my code but it won't work as 'number' remains the integer I give it before the while loop.

import requests
from bs4 import BeautifulSoup

url = 'https://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
a = soup.findAll("h2", class_="esl82me0")

for story_heading in a:
    print(story_heading.contents[0])

lijst = ["trump", "Trump", "Corona", "COVID", "virus", "Virus", "Coronavirus", "COVID-19"]
number = 0
run = 0

while run < len(a)+1:
    run += 1
     if any(lijst in s for s in a)
        number += 1

print("\nTrump or the Corona virus have been mentioned", number, "times.")

So I basically want the variable 'number' to increase by 1 if a headline (which is an entry in the list a) has the word Trump or Coronavirus or both in them.

Does anyone know how to do this?


Solution

  • In general, I recommend putting more thought into naming variables. I like how you tried to print the story headings. The line if any(lijst in s for s in a) does not do what you think it should: you need to instead iterate over each word in a single h2. The any function is just a short hand for the following:

    def any(iterable):
        for element in iterable:
            if element:
                return True
        return False
    

    In other words, you're trying to see if an entire list is in an h2 element, which will never be true. Here is an example fix.

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.nytimes.com'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    h2s = soup.findAll("h2", class_="esl82me0")
    
    for story_heading in h2s:
        print(story_heading.contents[0])
    
    keywords = ["trump", "Trump", "Corona", "COVID", "virus", "Virus", "Coronavirus", "COVID-19"]
    number = 0
    run = 0
    
    for h2 in h2s:
        headline = h2.text
        words_in_headline = headline.split(" ")
        for word in words_in_headline:
            if word in keywords:
                number += 1
    print("\nTrump or the Corona virus have been mentioned", number, "times.")
    

    Output

    Trump or the Corona virus have been mentioned 7 times.