I am not able to Scrape Table content with MIME format of data:application/octet-stream using python

I am trying to scrape some data from website, but the data is contained in an Iframe. Initially I scraped the source link but from the source also I am not able to scrape the data. I need help how to extract the data from this source link. Here is the source link: https://chartviewer-europublic.bigapis.net/nzgaV/index.html

Also I am sharing the screenshot here showing the download button url of the data under "a" tag but I am not able to extract this link also.

enter image description here

Here is the code I have used. I have used BeautifulSoup for the scraping.

# Libraries

from bs4 import BeautifulSoup
import requests

# Original website link
url_spain_total="https://anfac.com/cifras-clave/matriculaciones-turismos-y-todoterreno/"

page_total=requests.get(url_spain_total).text

soup_spain_total=BeautifulSoup(page_total,"lxml")

print(soup_spain_total.prettify())

# Getting the list of links in the iframe
result_spain=soup_spain_total.find_all("iframe")
result_spain

# Getting the required source link
total_main_link=result_spain[1]["src"]
total_main_link

After getting the source link, I am not able to extract the table contents.

Any help is appreciated. Thanks in Advance!

Solution

The following is an example of how you can get that data using selenium:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
# chrome_options.add_argument("--headless")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1920,1080")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = ' https://chartviewer-europublic.bigapis.net/nzgaV/index.html'
browser.get(url) 
table = wait.until(EC.element_to_be_clickable((By.ID, "datatable")))
df = pd.read_html(table.get_attribute("outerHTML"))[0]
print(df)

This will get the information as a dataframe, and display it in terminal:

	Categoría	Ago-22	Ago-21	% Variacion	Acumulado 2022	Acumulado 2021	% Variacion Acumulado
0	Gasolina	22.3402	20.0702	11311.31	231.348	279.89	-17-17.34
1	Diesel	8.9639	8.06481	11211.15	92.9799	119.641	-22-22.29
2	Resto	20.6042	19.4492	595.94	208.715	188.782	1110.56
3	Total combustibles	51.9075	47.5835	919.09	533.043	588.314	-9-9.39
4	Particular	24.9512	26.0833	-4,3-4.34	233.413	236.728	-1-1.4
5	Empresa	21.7122	17.6732	22922.85	224.337	215.654	44.03
6	Alquiler	5.24452	3.82738	37037.03	75.2928	135.931	-45-44.61
7	Total canales	51.9075	47.5835	919.09	533.043	588.314	-9-9.39

The selenium setup is for linux. However, if you would just peruse the questions on Selenium on this forum, you would find countless examples of selenium/chromedriver setups for Windows, if you are using Windows (or for Mac, for that matter).

Also, Selenium documentation is helpful: https://www.selenium.dev/documentation/webdriver/getting_started/