Search code examples
pythonpandasweb-scrapingyahoo-finance

How to fetch historical ESG data from Yahoo?


I'm trying to scrape historical ESG data from Sustainalytics on Yahoo Finance using Python. Specifically, let's say I want the last 10 years of ESG scores of a given list of constituents.

The following code lines provides up-to-date ESG scores. But I would like to scrape past-ESG performance. I am essentially looking for yearly (monthly, if possible) ESG from Jan 2010 till Dec 2020. I would like to automate the scraping and save the data in a txt or csv file.

# import yfinance, pandas and os
import yfinance as yf
import pandas as pd
import os

Code for a single ticker:

msft = "MSFT"
msft_y = yf.Ticker(msft)
esg_data = pd.DataFrame.transpose(msft_y.sustainability)
esg_data['company_ticker'] = str(msft_y ticker)

It returns a dataframe of 27 rows covering ESG-relevant information for Microsoft.

Code for the S&P 500:

# Import list of tickers from file
os.chdir("C:\...")
sp_500 = pd.read_csv('SP_500_tickers.csv')
# Retrieve Yahoo! Finance Sustainability Scores for each ticker
for i in sp_500['ticker_code']:
    # print(i)
    i_y = yf.Ticker(i)
    try:
        if i_y.sustainability is not None:
            temp = pd.DataFrame.transpose(i_y.sustainability)
            temp['company_ticker'] = str(i_y.ticker)
            # print(temp)
            esg_data = esg_data.append(temp)
    except IndexError:
        pass

It returns a dataframe of ESG data for the S&P500 constituents, which we can use for analysis. The idea behind is to create portfolios of “good” and “bad” ESG companies and compare performance to see how share prices have performed during different historical time periods.

So far, these codes cannot fetch ESG data for past dates.


Solution

  • I was not able to use requests.get as the connection was forbidden. Error 403. So I tried using urllib.request.urlopen from the StackOverflow board below. Visit Python; urllib error: AttributeError: 'bytes' object has no attribute 'read'

    import pandas as pd
    from datetime import datetime as dt
    import urllib.request
    import json
    dataframes = []
    url = "https://query2.finance.yahoo.com/v1/finance/esgChart?symbol=A"
    
    connection = urllib.request.urlopen(url)
    
    data = connection.read()
    data_2 = json.loads(data)
    Formatdata = data_2["esgChart"]["result"][0]["symbolSeries"]
    Formatdata_2 = pd.DataFrame(Formatdata)
    Formatdata_2["timestamp"] = pd.to_datetime(Formatdata_2["timestamp"], unit="s")
    

    Preview prints the data as shown by putty

    print(Formatdata_2.head())
    

    The rest of the code was adapted from putty