I'm trying to scrape historical ESG data from Sustainalytics on Yahoo Finance using Python. Specifically, let's say I want the last 10 years of ESG scores of a given list of constituents.
The following code lines provides up-to-date ESG scores. But I would like to scrape past-ESG performance. I am essentially looking for yearly (monthly, if possible) ESG from Jan 2010 till Dec 2020. I would like to automate the scraping and save the data in a txt or csv file.
# import yfinance, pandas and os
import yfinance as yf
import pandas as pd
import os
Code for a single ticker:
msft = "MSFT"
msft_y = yf.Ticker(msft)
esg_data = pd.DataFrame.transpose(msft_y.sustainability)
esg_data['company_ticker'] = str(msft_y ticker)
It returns a dataframe of 27 rows covering ESG-relevant information for Microsoft.
Code for the S&P 500:
# Import list of tickers from file
os.chdir("C:\...")
sp_500 = pd.read_csv('SP_500_tickers.csv')
# Retrieve Yahoo! Finance Sustainability Scores for each ticker
for i in sp_500['ticker_code']:
# print(i)
i_y = yf.Ticker(i)
try:
if i_y.sustainability is not None:
temp = pd.DataFrame.transpose(i_y.sustainability)
temp['company_ticker'] = str(i_y.ticker)
# print(temp)
esg_data = esg_data.append(temp)
except IndexError:
pass
It returns a dataframe of ESG data for the S&P500 constituents, which we can use for analysis. The idea behind is to create portfolios of “good” and “bad” ESG companies and compare performance to see how share prices have performed during different historical time periods.
So far, these codes cannot fetch ESG data for past dates.
I was not able to use requests.get as the connection was forbidden. Error 403. So I tried using urllib.request.urlopen from the StackOverflow board below. Visit Python; urllib error: AttributeError: 'bytes' object has no attribute 'read'
import pandas as pd
from datetime import datetime as dt
import urllib.request
import json
dataframes = []
url = "https://query2.finance.yahoo.com/v1/finance/esgChart?symbol=A"
connection = urllib.request.urlopen(url)
data = connection.read()
data_2 = json.loads(data)
Formatdata = data_2["esgChart"]["result"][0]["symbolSeries"]
Formatdata_2 = pd.DataFrame(Formatdata)
Formatdata_2["timestamp"] = pd.to_datetime(Formatdata_2["timestamp"], unit="s")
Preview prints the data as shown by putty
print(Formatdata_2.head())
The rest of the code was adapted from putty