Search code examples
pythonweb-scrapingbeautifulsouppython-3.7

beautifulSoup soup.select() returning empty for css selector


I am trying to parse some links from this site https://news.ycombinator.com/

I want to select a specific table

document.querySelector("#hnmain > tbody > tr:nth-child(3) > td > table")

I know there css selector limitations for bs4. But the problem is I can't even select as simple as #hnmain > tbody with soup.select('#hnmain > tbody') as it is returning empty

with below code, I'm unable to parse tbody whereas the with js I did (screenshot)

from bs4 import BeautifulSoup
import requests
print("-"*100)
print("Hackernews parser")
print("-"*100)
url="https://news.ycombinator.com/"
res=requests.get(url)
html=res.content
soup=BeautifulSoup(html)
table=soup.select('#hnmain > tbody')
print(table)

OUT:

soup=BeautifulSoup(html)
[]

screenshot


Solution

  • I am not getting the html tag tbody from beautifulsoup or the curl script. It means

    soup.select('tbody')
    

    returns empty list. This is the same reason for you to get an empty list.

    To just extract the links you are looking for just do

    soup.select("a.storylink")
    

    It will get the links that you want from the site.