I want to scrap the data table from the web, and there is an error occurred in for looping statement
below is my code
table_body = soup.find("tbody")
rows = []
for row in table_body.find_all("tr"):
cols = row.find_all("td")
rd = cols[0].text.strip()
mv = cols[1].text.strip()
pb = cols[2].text.strip()
dg = cols[3].text.strip()
wg = cols[4].text.strip()
rows.append([rd, mv, pb, dg, wg])
headers = ["Release Date", "Movie", "Production Budget", "Domestic Gross", "Worldwide Gross"]
and this message error when I have tried running a code
AttributeError Traceback (most recent call last)
Cell In[4], line 4
1 table_body = soup.find("tbody")
2 rows = []
----> 4 for row in table_body.find_all("tr"):
5 cols = row.find_all("td")
6 rd = cols[0].text.strip()
AttributeError: 'NoneType' object has no attribute 'find_all'
what should I do?
In the html source, there is no <tbody>
tag. So that returns None, which then inevitably will return no <tr>
tags.
change table_body = soup.find("tbody")
to table_body = soup.find("table")
Or, even an easier solution, let pandas parse the table:
import pandas as pd
import requests
url = "https://www.the-numbers.com/movie/budgets/all"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'}
response = requests.get(url, headers=headers)
df = pd.read_html(response.text)[0]
Output:
print(df)
Unnamed: 0 Release Date ... Domestic Gross Worldwide Gross
0 1 Dec 16, 2015 ... $936,662,225 $2,064,615,817
1 2 Dec 9, 2022 ... $684,075,767 $2,317,514,386
2 3 Jun 28, 2023 ... $174,480,468 $383,963,057
3 4 Apr 23, 2019 ... $858,373,000 $2,748,242,781
4 5 May 20, 2011 ... $241,071,802 $1,045,713,802
.. ... ... ... ... ...
95 96 Feb 28, 2020 ... $61,555,145 $133,357,601
96 97 Nov 22, 2023 ... $61,524,375 $217,917,156
97 98 Dec 16, 2020 ... $46,801,036 $166,360,232
98 99 Jan 31, 2024 ... $45,207,275 $96,210,855
99 100 Sep 4, 2020 ... $0 $69,973,540
[100 rows x 6 columns]