I am trying to scrap a website but there is a problem as my code is:
articles = soup.find_all("article", {"class": "newspost"})
but I don't know why in the results I have also data from the class called "newspost review"
I tried to exclude this specific class but it doesn't work so I tried to use if clause to get results only if time attribute exists but it also doesn't work.
I guess that excluding specific class should be easier but I don't know how to do that.
My code:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
url = "https://www.techpowerup.com/"
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
articles = soup.find_all("article", {"class": "newspost"})
for article in articles:
title = article.find("a", class_ = "newslink").text.strip()
article_url = urljoin(url,article.find("a")["href"])
date = article.find("time")
if date is not None:
print(date)
else:
print("Not found")
print(title)
print(article_url)
This code works but date is looking bad and when I try to use select_one("time").get('datetime')
I get NoneType error
This is one way to get the articles you want, while avoiding the review
ones:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
headers= {
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
}
big_list = []
r = requests.get("https://www.techpowerup.com/", headers=headers)
soup = bs(r.text, 'html.parser')
articles = soup.select('article[class^="newspost"]')
for art in [x for x in articles if x.has_attr("data-id")]:
a_title = art.select_one('h1 a[class="newslink"]').get_text(strip=True, separator=' ') if art.select_one('h1 a[class="newslink"]') else None
a_link = 'https://www.techpowerup.com' + art.select_one('h1 a[class="newslink"]').get('href') if art.select_one('h1 a[class="newslink"]') else None
a_timestamp = art.select_one('time').get('datetime') if art.select_one('time') else None
big_list.append((a_title, a_timestamp,a_link))
df = pd.DataFrame(big_list, columns = ['Title', 'Timestamp', 'Url'])
print(df)
Result in terminal:
Title Timestamp Url
0 MSI Announces the PRO MP251 Series - The World... 2023-10-11T08:24:17+00:00 https://www.techpowerup.com/314596/msi-announc...
1 Global Notebook Market to Rebound in 2024, Pro... 2023-10-11T08:08:48+00:00 https://www.techpowerup.com/314595/global-note...
2 ASUS Showcases Cutting-Edge Cloud Solutions at... 2023-10-11T08:05:47+00:00 https://www.techpowerup.com/314594/asus-showca...
3 Logitech Introduces Ergonomic Wave Keys and Wa... 2023-10-11T07:15:30+00:00 https://www.techpowerup.com/314593/logitech-in...
4 MSI Announces Thrilling New Partnership with S... 2023-10-11T06:51:41+00:00 https://www.techpowerup.com/314589/msi-announc...
5 Qualcomm Oryon PC SoC to be Rebranded as "Snap... 2023-10-11T06:03:35+00:00 https://www.techpowerup.com/314588/qualcomm-or...
6 Blast Through Hyperspace with Logitech G's Ret... 2023-10-10T19:36:40+00:00 https://www.techpowerup.com/314579/blast-throu...
7 D-Link's DWA-F18 VR Air Bridge Tailored for Me... 2023-10-10T18:05:48+00:00 https://www.techpowerup.com/314577/d-links-dwa...
8 Sony Announces Refreshed, Slimmer PlayStation ... 2023-10-10T17:54:10+00:00 https://www.techpowerup.com/314576/sony-announ...
9 Sony Unveils INZONE Buds and the INZONE H5 Hea... 2023-10-10T17:40:49+00:00 https://www.techpowerup.com/314574/sony-unveil...
10 Alan Wake 2 GeForce RTX 40 Series Bundle Avail... 2023-10-10T15:30:56+00:00 https://www.techpowerup.com/314570/alan-wake-2...
11 Lords of the Fallen Launches With DLSS 3 On Oc... 2023-10-10T15:26:12+00:00 https://www.techpowerup.com/314569/lords-of-th...
12 EnGenius Unveils the Most Intuitive Security G... 2023-10-10T14:57:13+00:00 https://www.techpowerup.com/314568/engenius-un...
13 JEDEC and Open Compute Project Foundation Pave... 2023-10-10T14:49:23+00:00 https://www.techpowerup.com/314567/jedec-and-o...
14 AMD to Acquire Open-Source AI Software Expert ... 2023-10-10T14:38:34+00:00 https://www.techpowerup.com/314566/amd-to-acqu...
15 Micron Delivers High-Speed 7,200 MT/s DDR5 Mem... 2023-10-10T14:36:07+00:00 https://www.techpowerup.com/314565/micron-deli...
16 NVIDIA GeForce 537.58 WHQL Game Ready Drivers ... 2023-10-10T14:13:15+00:00 https://www.techpowerup.com/314564/nvidia-gefo...
17 CORSAIR Launches New LCD-Equipped AIO CPU Cool... 2023-10-10T13:10:15+00:00 https://www.techpowerup.com/314561/corsair-lau...
18 CORSAIR Launches K70 CORE, The New Standard fo... 2023-10-10T13:06:45+00:00 https://www.techpowerup.com/314560/corsair-lau...
19 Intel Launches Arc A580 Graphics Card for 1080... 2023-10-10T13:00:01+00:00 https://www.techpowerup.com/314559/intel-launc...
20 Starfield Gets New Update 1.7.36, Improving In... 2023-10-10T12:24:06+00:00 https://www.techpowerup.com/314558/starfield-g...
21 DataLocker Introduces Sentry 5: The Ultimate H... 2023-10-10T12:23:18+00:00 https://www.techpowerup.com/314557/datalocker-...
22 Steam Next Fest: October '23 Edition is on NOW 2023-10-10T12:10:32+00:00 https://www.techpowerup.com/314556/steam-next-...
23 be quiet! Introduces Dark Rock Elite and Dark ... 2023-10-10T12:08:33+00:00 https://www.techpowerup.com/314555/be-quiet-in...
24 Unity CEO Steps Down After Engine Runtime Fee ... 2023-10-10T11:59:54+00:00 https://www.techpowerup.com/314554/unity-ceo-s...
25 Intel Publishes 14th Gen Core Processor Model ... 2023-10-10T09:36:46+00:00 https://www.techpowerup.com/314551/intel-publi...
26 Intel Releases Arc GPU Graphics Drivers 101.48... 2023-10-10T03:42:43+00:00 https://www.techpowerup.com/314548/intel-relea...
27 Logitech's Upcoming Ergonomic Keyboard Leaks A... 2023-10-09T18:57:58+00:00 https://www.techpowerup.com/314540/logitechs-u...
28 TRIBIT Launches a New Portable Speaker, StormB... 2023-10-09T16:46:47+00:00 https://www.techpowerup.com/314537/tribit-laun...
29 Last Train Home Gets New Demo Before November ... 2023-10-09T14:25:21+00:00 https://www.techpowerup.com/314532/last-train-...
30 Make Way Demo Smashes into Steam Next Fest 2023-10-09T14:19:49+00:00 https://www.techpowerup.com/314531/make-way-de...
31 Kingston FURY DDR4 UDIMMs Get a New Look 2023-10-09T13:37:57+00:00 https://www.techpowerup.com/314528/kingston-fu...
32 Flexxon Announces Xsign, a Physical Security K... 2023-10-09T09:43:44+00:00 https://www.techpowerup.com/314520/flexxon-ann...
33 Microsoft to Unveil Custom AI Chips to Fight N... 2023-10-09T07:00:36+00:00 https://www.techpowerup.com/314508/microsoft-t...
34 This Week in Gaming (Week 41) 2023-10-08T11:14:23+00:00 https://www.techpowerup.com/314503/this-week-i...
See BeautifulSoup documentation for more details.