Search code examples
pythonreplacelookup

Index, lookup and replace


I am working with yfinance data, which uses stock tickers. However, these stock tickers are often not easily recognisable. Hence, after performing download functions, i am trying to replace "stock tickers" by looking up a list and then replacing original stock ticker in the dataframe. I am rather new to dictionaries and the index/replace concept. Could anyone please point me in the right direction?

import yfinance as yf
import pandas as pd
import csv

tickerCsv = open('Stocks.csv', 'r')
reader = csv.reader(tickerCsv)
next(reader, None)  # skip the headers
tickers = {rows[0] for rows in reader}

data = yf.download(tickers, period='1mo', start="2009-01-01", end="2020-12-31", group_by='ticker')

close_prices = pd.DataFrame()

for ticker in tickers:
    close_prices[ticker] = data[ticker]['Adj Close']
    close_prices[ticker].replace(dict(zip(reader['Symbol'], reader['Name'])))

Stocks.csv file as follows:

Stock tickers on the left, Name I would like to replace with after the yfinance download


Solution

  • Your code is pretty much there. I would use pandas to read the csv file and generate the symbol/name map:

    dd = pd.read_csv('Stocks.csv').set_index('Symbol')['Name'].to_dict()
    

    Output:

    {'C38U.SI': 'CICT', 'A17U.SI': 'CLAR', 'M44U.SI': 'MLT', 'ME8U.SI': 'MINT'}
    

    Then tickers is just the keys of dd:

    tickers = list(dd)
    

    Finally, you want to replace the column names in close_prices which you can do like this:

    close_prices.columns = close_prices.columns.map(dd)
    

    In total:

    import yfinance as yf
    import pandas as pd
    import csv
    
    dd = pd.read_csv('Stocks.csv').set_index('Symbol')['Name'].to_dict()
    tickers = list(dd)
    
    data = yf.download(tickers, period='1mo', start="2009-01-01", end="2020-12-31", group_by='ticker')
    
    close_prices = pd.DataFrame()
    
    for ticker in tickers:
        close_prices[ticker] = data[ticker]['Adj Close']
    
    close_prices.columns = close_prices.columns.map(dd)
    

    Output:

                    CICT      CLAR       MLT      MINT
    Date
    2009-01-02  0.601655  0.538735  0.144619       NaN
    2009-01-05  0.631185  0.600304  0.156840       NaN
    2009-01-06  0.631185  0.615697  0.154804       NaN
    2009-01-07  0.660713  0.596456  0.158877       NaN
    2009-01-08  0.642258  0.577216  0.158877       NaN
    ...              ...       ...       ...       ...
    2020-12-23  1.827041  2.496265  1.674264  2.397144
    2020-12-24  1.810045  2.487803  1.691262  2.380439
    2020-12-28  1.810045  2.487803  1.691262  2.380439
    2020-12-29  1.835539  2.504726  1.691262  2.388792
    2020-12-30  1.852535  2.547036  1.708259  2.413848
    
    [3012 rows x 4 columns]