Search code examples
pythonpandasbeautifulsoupexport-to-csv

Why does BeautifulSoup fail to extract data from websites to csv?


User Chrisvdberge helped me creating the following code :

import pandas as pd
import requests
from bs4 import BeautifulSoup

url_DAX = 'https://www.eurexchange.com/exchange-en/market-data/statistics/market-statistics-online/100!onlineStats?viewType=4&productGroupId=13394&productId=34642&cp=&month=&year=&busDate=20191114'
req = requests.get(url_DAX, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_DAX.csv')
print(df)

url_DOW = 'https://www.cmegroup.com/trading/equity-index/us-index/e-mini-dow_quotes_settlements_futures.html'
req = requests.get(url_DOW, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_DOW.csv')
print(df)

url_NASDAQ = 'https://www.cmegroup.com/trading/equity-index/us-index/e-mini-nasdaq-100_quotes_settlements_futures.html'
req = requests.get(url_NASDAQ, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_NASDAQ.csv')
print(df)

url_CAC = 'https://live.euronext.com/fr/product/index-futures/FCE-DPAR/settlement-prices'
req = requests.get(url_CAC, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_CAC.csv')
print(df)

I have the following result :

  • 3 .csv files are created : results_DAX.csv (here, everything is ok, I have the values I want.) ; results_DOW.csv and results_NASDAQ.csv (here, the problem is that the .csv files don't have the wanted values.. I don't understand why ?)

  • As you can see in the code, 4 files should be created and not only 3.

So my questions are :

  • How to get 4 csv files ?

  • How to get values in the results_DOW.csv and in the results_NASDAQ.csv files ? (and maybe also in the results_CAC.csv file)

Thank you for your answers ! :)


Solution

  • Try this to get those other sites. The last site is a little trickier, so you'd need to try out Selenium:

    import pandas as pd
    import requests
    from bs4 import BeautifulSoup
    from datetime import date, timedelta
    
    url_DAX = 'https://www.eurexchange.com/exchange-en/market-data/statistics/market-statistics-online/100!onlineStats?viewType=4&productGroupId=13394&productId=34642&cp=&month=&year=&busDate=20191114'
    df = pd.read_html(url_DAX)[0]
    df.to_csv('results_DAX.csv')
    print(df)
    
    
    
    dt = date.today() - timedelta(days=2)
    dateParam =  dt.strftime('%m/%d/%Y')
    
    
    url_DOW = 'https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/318/FUT'
    payload = {
    'tradeDate': dateParam,
    'strategy': 'DEFAULT',
    'pageSize': '500',
    '_': '1573920502874'}
    response = requests.get(url_DOW, params=payload).json()
    df = pd.DataFrame(response['settlements'])
    df.to_csv('results_DOW.csv')
    print(df)
    
    
    url_NASDAQ = 'https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/146/FUT'
    payload = {
    'tradeDate': dateParam,
    'strategy': 'DEFAULT',
    'pageSize': '500',
    '_': '1573920650587'}
    response = requests.get(url_NASDAQ, params=payload).json()
    df = pd.DataFrame(response['settlements'])
    df.to_csv('results_NASDAQ.csv')
    print(df)