Search code examples
pythonbeautifulsouptablehtml

html table scraping in python


I wanted to scrape an html table with this code

import requests
from bs4 import BeautifulSoup

page1 = requests.get("http://kworb.net/spotify/country/br_weekly.html")                                                  
soup = BeautifulSoup(page1.content, 'html.parser')
for tr in soup.findAll('tr'):
    tds =tr.find_all('td')
    print(tds[0].text)

it seems to work: I'm able to get the table and each of its rows in a different tds list. Except that when I try to get the first column for each row (tds[0].text) there's an error.

Could you provide some clues?


Solution

  • First row has headers <th> instead of <td> so you get empty tds - and you have to check size for tds

    if len(tds) > 0:
        print(tds[0].text)
    

    or shorter

    if tds:
        print(tds[0].text)
    

    Or you can skip first row using [1:]

    for tr in soup.find_all('tr')[1:]:
       tds = tr.find_all('td')
       print(tds[0].text)