I would like to seek help with regards to my Google Colaboratory Notebook. The error is located in fourth cell.
Context:
We're performing Web scraping BTC's Historical Data.
Here's my codes:
First cell (Executed successfully)
#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
Second cell (Executed successfully)
#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text, 'lxml')
#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []
#creating a table
for row in table_body.find_all('tr'):
col = row.find_all('td')
col = [ele.text.strip() for ele in col ] # stripping the whitespaces
row_data.append(col) #append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
df
Third cell (Executed successfully)
headers = []
for i in soup.find_all('th'):
col_name = i.text.strip().lower().replace(" ", "_")
headers.append(col_name)
headers
Fourth cell (Execution failed)
df = pd.DataFrame(row_data, columns=headers)
df
#into a file
df.to_csv('/content/file.csv')
AssertionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
563 try:
--> 564 columns = _validate_or_indexify_columns(content, columns)
565 result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
AssertionError: 13 columns passed, passed data had 7 columns
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
565 result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
566 except AssertionError as e:
--> 567 raise ValueError(e) from e
568 return result, columns
569
ValueError: 13 columns passed, passed data had 7 columns
To load the table you can use simple pd.read_html()
. For example:
import pandas as pd
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
df = pd.read_html(url)[0]
print(df)
df.to_csv("data.csv")
Creates data.csv
(screenshot from LibreOffice):
To correct your example:
# importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
# sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
# request the page
page = requests.get(url)
# creating a soup object and the parser
soup = BeautifulSoup(page.text, "lxml")
# creating a table body to pass on the soup to find the table
table_body = soup.find("table")
# creating an empty list to store information
row_data = []
# creating a table
for row in table_body.select("tr:has(td)"):
col = row.find_all("td")
col = [ele.text.strip() for ele in col] # stripping the whitespaces
row_data.append(col) # append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
headers = []
for i in table_body.select("th"):
col_name = i.text.strip().lower().replace(" ", "_")
headers.append(col_name)
df = pd.DataFrame(row_data, columns=headers)
print(df)
df.to_csv("/content/file.csv")