Search code examples
web-scrapingbeautifulsoup

AttributeError: 'NoneType' object has no attribute 'find_all' to retrieve data table


I want to scrap the data table from the web, and there is an error occurred in for looping statement

below is my code

table_body = soup.find("tbody")
rows = []

for row in table_body.find_all("tr"):
    cols = row.find_all("td")
    rd = cols[0].text.strip()
    mv = cols[1].text.strip()
    pb = cols[2].text.strip()
    dg = cols[3].text.strip()
    wg = cols[4].text.strip()
        
    rows.append([rd, mv, pb, dg, wg])

headers = ["Release Date", "Movie", "Production Budget", "Domestic Gross", "Worldwide Gross"]

and this message error when I have tried running a code

AttributeError                            Traceback (most recent call last)
Cell In[4], line 4
      1 table_body = soup.find("tbody")
      2 rows = []
----> 4 for row in table_body.find_all("tr"):
      5     cols = row.find_all("td")
      6     rd = cols[0].text.strip()

AttributeError: 'NoneType' object has no attribute 'find_all'

what should I do?


Solution

  • In the html source, there is no <tbody> tag. So that returns None, which then inevitably will return no <tr> tags.

    change table_body = soup.find("tbody") to table_body = soup.find("table")

    Or, even an easier solution, let pandas parse the table:

    import pandas as pd
    import requests
    
    url = "https://www.the-numbers.com/movie/budgets/all"
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'}
    response = requests.get(url, headers=headers)
    
    df = pd.read_html(response.text)[0]
    

    Output:

    print(df)
        Unnamed: 0  Release Date  ... Domestic Gross Worldwide Gross
    0            1  Dec 16, 2015  ...   $936,662,225  $2,064,615,817
    1            2   Dec 9, 2022  ...   $684,075,767  $2,317,514,386
    2            3  Jun 28, 2023  ...   $174,480,468    $383,963,057
    3            4  Apr 23, 2019  ...   $858,373,000  $2,748,242,781
    4            5  May 20, 2011  ...   $241,071,802  $1,045,713,802
    ..         ...           ...  ...            ...             ...
    95          96  Feb 28, 2020  ...    $61,555,145    $133,357,601
    96          97  Nov 22, 2023  ...    $61,524,375    $217,917,156
    97          98  Dec 16, 2020  ...    $46,801,036    $166,360,232
    98          99  Jan 31, 2024  ...    $45,207,275     $96,210,855
    99         100   Sep 4, 2020  ...             $0     $69,973,540
    
    [100 rows x 6 columns]