So I am trying to follow a video tutorial that is just a bit outdated. In the video, using href = links[idx].get('href')
grabs the link, however if I use it here, it won't work. It just says none. If I just type .getText()
it will grab the title.
The element for the entire href and title is <a href="https://mullvad.net/nl/blog/2023/2/2/stop-the-proposal-on-mass-surveillance-of-the-eu/">Stop the proposal on mass surveillance of the EU</a>
Here's my code:
`import requests
from bs4 import BeautifulSoup
res = requests.get('https://news.ycombinator.com/news')
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.select('.titleline')
votes = soup.select('.score')
def create_custom_hn(links, votes):
hn = []
for idx, item in enumerate(links):
title = links[idx].getText()
href = links[idx].get('href')
print(href)
#hn.append({'title': title, 'link': href})
return hn
print(create_custom_hn(links, votes))`
I tried to grab the link using .get('href')
Try to select your elements more specific and avoid using different lists
there is no need for that and you have to ensure that they will have same length.
You could get all information in one go, selecting the <tr>
with class athing
and its next sibling.
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://news.ycombinator.com/news').text)
data = []
for i in soup.select('.athing'):
data.append({
'title' : i.select_one('span a').text,
'link' : i.select_one('span a').get('href'),
'score' : list(i.next_sibling.find('span').stripped_strings)[0]
})
data
[{'title': 'Stop the proposal on mass surveillance of the EU',
'link': 'https://mullvad.net/nl/blog/2023/2/2/stop-the-proposal-on-mass-surveillance-of-the-eu/',
'score': '287 points'},
{'title': 'Bay 12 Games has made $7M from the Steam release of Dwarf Fortress',
'link': 'http://www.bay12forums.com/smf/index.php?topic=181354.0',
'score': '416 points'},
{'title': "Google's OSS-Fuzz expands fuzz-reward program to $30000",
'link': 'https://security.googleblog.com/2023/02/taking-next-step-oss-fuzz-in-2023.html',
'score': '31 points'},
{'title': "Connecticut Parents Arrested for Letting Kids Walk to Dunkin' Donuts",
'link': 'https://reason.com/2023/01/30/dunkin-donuts-parents-arrested-kids-cops-freedom/',
'score': '225 points'},
{'title': 'Ronin 2.0 – open-source Ruby toolkit for security research and development',
'link': 'https://ronin-rb.dev/blog/2023/02/01/ronin-2-0-0-finally-released.html',
'score': '62 points'},...]