python excel pandas dataframe data-mining

Cannot export to ".csv" file - pandas.DataFrame

I would like to seek help with regards to my Google Colaboratory Notebook. The error is located in fourth cell.

Context:
We're performing Web scraping BTC's Historical Data.

Here's my codes:

First cell (Executed successfully)

#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd

Second cell (Executed successfully)

#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text, 'lxml')

#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []

#creating a table 
for row in table_body.find_all('tr'):
  col = row.find_all('td')
  col = [ele.text.strip() for ele in col ] # stripping the whitespaces
  row_data.append(col) #append the column

# extracting all data on table entries
df = pd.DataFrame(row_data)
df

Third cell (Executed successfully)

headers = []
for i in soup.find_all('th'):
  col_name = i.text.strip().lower().replace(" ", "_")
  headers.append(col_name)
headers

Fourth cell (Execution failed)

df = pd.DataFrame(row_data, columns=headers)
df
#into a file 
df.to_csv('/content/file.csv')

The error! :(

AssertionError                            Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    563     try:
--> 564         columns = _validate_or_indexify_columns(content, columns)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)

AssertionError: 13 columns passed, passed data had 7 columns

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
    566     except AssertionError as e:
--> 567         raise ValueError(e) from e
    568     return result, columns
    569 

ValueError: 13 columns passed, passed data had 7 columns

Solution

To load the table you can use simple pd.read_html(). For example:

import pandas as pd

url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"

df = pd.read_html(url)[0]
print(df)
df.to_csv("data.csv")

Creates data.csv (screenshot from LibreOffice):

To correct your example:

# importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd

# sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
# request the page
page = requests.get(url)
# creating a soup object and the parser
soup = BeautifulSoup(page.text, "lxml")

# creating a table body to pass on the soup to find the table
table_body = soup.find("table")
# creating an empty list to store information
row_data = []

# creating a table
for row in table_body.select("tr:has(td)"):
    col = row.find_all("td")
    col = [ele.text.strip() for ele in col]  # stripping the whitespaces
    row_data.append(col)  # append the column

# extracting all data on table entries
df = pd.DataFrame(row_data)

headers = []
for i in table_body.select("th"):
    col_name = i.text.strip().lower().replace(" ", "_")
    headers.append(col_name)

df = pd.DataFrame(row_data, columns=headers)
print(df)
df.to_csv("/content/file.csv")