Search code examples
pythonbeautifulsoupattributes

BeautifulSoup NoneType error because additional class is fetched


I am trying to scrap a website but there is a problem as my code is:

articles = soup.find_all("article", {"class": "newspost"})

but I don't know why in the results I have also data from the class called "newspost review"

I tried to exclude this specific class but it doesn't work so I tried to use if clause to get results only if time attribute exists but it also doesn't work.

I guess that excluding specific class should be easier but I don't know how to do that.

My code:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url = "https://www.techpowerup.com/"

response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    
    articles = soup.find_all("article", {"class": "newspost"})
    
    for article in articles:
        title = article.find("a", class_ = "newslink").text.strip()
        article_url = urljoin(url,article.find("a")["href"])
        date = article.find("time")
        if date is not None:
            print(date)
        else:
            print("Not found")
        
        print(title)
        print(article_url)

This code works but date is looking bad and when I try to use select_one("time").get('datetime') I get NoneType error


Solution

  • This is one way to get the articles you want, while avoiding the review ones:

    from bs4 import BeautifulSoup as bs
    import requests
    import pandas as pd
    
    headers= {
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
    }
    
    big_list = []
    r = requests.get("https://www.techpowerup.com/", headers=headers)
    soup = bs(r.text, 'html.parser')
    
    articles = soup.select('article[class^="newspost"]')
    for art in [x for x in articles if x.has_attr("data-id")]:
        a_title = art.select_one('h1 a[class="newslink"]').get_text(strip=True, separator=' ') if art.select_one('h1 a[class="newslink"]') else None
        a_link = 'https://www.techpowerup.com' + art.select_one('h1 a[class="newslink"]').get('href') if art.select_one('h1 a[class="newslink"]') else None
        a_timestamp = art.select_one('time').get('datetime') if art.select_one('time') else None
        big_list.append((a_title, a_timestamp,a_link))
    df = pd.DataFrame(big_list, columns = ['Title', 'Timestamp', 'Url'])
    print(df)
    

    Result in terminal:

        Title   Timestamp   Url
    0   MSI Announces the PRO MP251 Series - The World...   2023-10-11T08:24:17+00:00   https://www.techpowerup.com/314596/msi-announc...
    1   Global Notebook Market to Rebound in 2024, Pro...   2023-10-11T08:08:48+00:00   https://www.techpowerup.com/314595/global-note...
    2   ASUS Showcases Cutting-Edge Cloud Solutions at...   2023-10-11T08:05:47+00:00   https://www.techpowerup.com/314594/asus-showca...
    3   Logitech Introduces Ergonomic Wave Keys and Wa...   2023-10-11T07:15:30+00:00   https://www.techpowerup.com/314593/logitech-in...
    4   MSI Announces Thrilling New Partnership with S...   2023-10-11T06:51:41+00:00   https://www.techpowerup.com/314589/msi-announc...
    5   Qualcomm Oryon PC SoC to be Rebranded as "Snap...   2023-10-11T06:03:35+00:00   https://www.techpowerup.com/314588/qualcomm-or...
    6   Blast Through Hyperspace with Logitech G's Ret...   2023-10-10T19:36:40+00:00   https://www.techpowerup.com/314579/blast-throu...
    7   D-Link's DWA-F18 VR Air Bridge Tailored for Me...   2023-10-10T18:05:48+00:00   https://www.techpowerup.com/314577/d-links-dwa...
    8   Sony Announces Refreshed, Slimmer PlayStation ...   2023-10-10T17:54:10+00:00   https://www.techpowerup.com/314576/sony-announ...
    9   Sony Unveils INZONE Buds and the INZONE H5 Hea...   2023-10-10T17:40:49+00:00   https://www.techpowerup.com/314574/sony-unveil...
    10  Alan Wake 2 GeForce RTX 40 Series Bundle Avail...   2023-10-10T15:30:56+00:00   https://www.techpowerup.com/314570/alan-wake-2...
    11  Lords of the Fallen Launches With DLSS 3 On Oc...   2023-10-10T15:26:12+00:00   https://www.techpowerup.com/314569/lords-of-th...
    12  EnGenius Unveils the Most Intuitive Security G...   2023-10-10T14:57:13+00:00   https://www.techpowerup.com/314568/engenius-un...
    13  JEDEC and Open Compute Project Foundation Pave...   2023-10-10T14:49:23+00:00   https://www.techpowerup.com/314567/jedec-and-o...
    14  AMD to Acquire Open-Source AI Software Expert ...   2023-10-10T14:38:34+00:00   https://www.techpowerup.com/314566/amd-to-acqu...
    15  Micron Delivers High-Speed 7,200 MT/s DDR5 Mem...   2023-10-10T14:36:07+00:00   https://www.techpowerup.com/314565/micron-deli...
    16  NVIDIA GeForce 537.58 WHQL Game Ready Drivers ...   2023-10-10T14:13:15+00:00   https://www.techpowerup.com/314564/nvidia-gefo...
    17  CORSAIR Launches New LCD-Equipped AIO CPU Cool...   2023-10-10T13:10:15+00:00   https://www.techpowerup.com/314561/corsair-lau...
    18  CORSAIR Launches K70 CORE, The New Standard fo...   2023-10-10T13:06:45+00:00   https://www.techpowerup.com/314560/corsair-lau...
    19  Intel Launches Arc A580 Graphics Card for 1080...   2023-10-10T13:00:01+00:00   https://www.techpowerup.com/314559/intel-launc...
    20  Starfield Gets New Update 1.7.36, Improving In...   2023-10-10T12:24:06+00:00   https://www.techpowerup.com/314558/starfield-g...
    21  DataLocker Introduces Sentry 5: The Ultimate H...   2023-10-10T12:23:18+00:00   https://www.techpowerup.com/314557/datalocker-...
    22  Steam Next Fest: October '23 Edition is on NOW  2023-10-10T12:10:32+00:00   https://www.techpowerup.com/314556/steam-next-...
    23  be quiet! Introduces Dark Rock Elite and Dark ...   2023-10-10T12:08:33+00:00   https://www.techpowerup.com/314555/be-quiet-in...
    24  Unity CEO Steps Down After Engine Runtime Fee ...   2023-10-10T11:59:54+00:00   https://www.techpowerup.com/314554/unity-ceo-s...
    25  Intel Publishes 14th Gen Core Processor Model ...   2023-10-10T09:36:46+00:00   https://www.techpowerup.com/314551/intel-publi...
    26  Intel Releases Arc GPU Graphics Drivers 101.48...   2023-10-10T03:42:43+00:00   https://www.techpowerup.com/314548/intel-relea...
    27  Logitech's Upcoming Ergonomic Keyboard Leaks A...   2023-10-09T18:57:58+00:00   https://www.techpowerup.com/314540/logitechs-u...
    28  TRIBIT Launches a New Portable Speaker, StormB...   2023-10-09T16:46:47+00:00   https://www.techpowerup.com/314537/tribit-laun...
    29  Last Train Home Gets New Demo Before November ...   2023-10-09T14:25:21+00:00   https://www.techpowerup.com/314532/last-train-...
    30  Make Way Demo Smashes into Steam Next Fest  2023-10-09T14:19:49+00:00   https://www.techpowerup.com/314531/make-way-de...
    31  Kingston FURY DDR4 UDIMMs Get a New Look    2023-10-09T13:37:57+00:00   https://www.techpowerup.com/314528/kingston-fu...
    32  Flexxon Announces Xsign, a Physical Security K...   2023-10-09T09:43:44+00:00   https://www.techpowerup.com/314520/flexxon-ann...
    33  Microsoft to Unveil Custom AI Chips to Fight N...   2023-10-09T07:00:36+00:00   https://www.techpowerup.com/314508/microsoft-t...
    34  This Week in Gaming (Week 41)   2023-10-08T11:14:23+00:00   https://www.techpowerup.com/314503/this-week-i...
    

    See BeautifulSoup documentation for more details.