I am trying to parse some links from this site https://news.ycombinator.com/
I want to select a specific table
document.querySelector("#hnmain > tbody > tr:nth-child(3) > td > table")
I know there css selector limitations for bs4. But the problem is I can't even select as simple as #hnmain > tbody
with soup.select('#hnmain > tbody')
as it is returning empty
with below code, I'm unable to parse tbody whereas the with js I did (screenshot)
from bs4 import BeautifulSoup
import requests
print("-"*100)
print("Hackernews parser")
print("-"*100)
url="https://news.ycombinator.com/"
res=requests.get(url)
html=res.content
soup=BeautifulSoup(html)
table=soup.select('#hnmain > tbody')
print(table)
OUT:
soup=BeautifulSoup(html)
[]
I am not getting the html tag tbody from beautifulsoup or the curl script. It means
soup.select('tbody')
returns empty list. This is the same reason for you to get an empty list.
To just extract the links you are looking for just do
soup.select("a.storylink")
It will get the links that you want from the site.