Search code examples
pythonpandassortingfor-looprow

how to append a distinct value for each iteration to a pandas dataframe through a loop?


I am trying to add the stock name value for each iteration to it's corresponding data bulk upload to a pandas dataframe:

This is what I've tried so far:

from pandas_datareader import data as pdr
import requests
from bs4 import BeautifulSoup
import json, requests
import pandas as pd
import re
import numpy as np
import pandas_datareader.data as web
import yfinance as yfin
from tqdm import tqdm
import numpy as np
import datetime
from datetime import timedelta

################# fetch series names for sic ######################

sic_emisoras_df = pd.json_normalize(
    json.loads(
        requests.get('https://www.bmv.com.mx/es/Grupo_BMV/BmvJsonGeneric?idSitioPagina=6&mercado=CGEN_SCSOP&tipoValor=CGEN_CASEO&random=5845')
            .text
            .split(';(', 1)[-1]
            .split(')')[0]
        )['response']['resultado']
).dropna(axis=1, how='all')



####################################################################


# define time range:
start=datetime.date.today()-datetime.timedelta(days=14)

end=datetime.date.today()

# fetch data
# get all SIC names as list
stock_names = sic_emisoras_df["cveCorta"].values.tolist()


# append information per stock name

sic_market_df = pd.DataFrame([])
sic_market_df["stock_name"] = np.nan


for i in tqdm(stock_names):

     # fetch data per stock_name
    try:
       yfin.pdr_override()

       # append stock name
       sic_market_df["stock_name"]=i

       # fetch information by stock name
       data = web.DataReader(i,start,end)
       # append rows to empty dataframe
       sic_market_df = sic_market_df.append(data)
    except KeyError:
       pass



print("Fetched sic_market_df!")

And the output only fetches the name for the first iteration but gets NaN for every other bulk upload :

         stock_name Open    High       Low        Close   Adj   Close    Volume
2024-02-20  ZS  14.500000   14.950000   14.490000   14.700000   14.700000   30253100.0
2024-02-21  ZS  14.590000   14.860000   14.570000   14.790000   14.790000   23032400.0
2024-02-22  ZS  14.940000   15.280000   14.890000   15.240000   15.240000   35702500.0
2024-02-23  ZS  15.150000   15.290000   14.950000   15.130000   15.130000   22914900.0
2024-02-26  ZS  15.130000   15.480000   15.130000   15.280000   15.280000   23675800.0

I would like to get a dataframe that identifies each iteration bulk upload with it's unique stock name, i.e, something like this:

             stock_name Open    High       Low        Close   Adj   Close    Volume
    2024-02-20  ZS  14.500000   14.950000   14.490000   14.700000   14.700000   30253100.0
    2024-02-21  ZS  14.590000   14.860000   14.570000   14.790000   14.790000   23032400.0
    2024-02-22  ZS  14.940000   15.280000   14.890000   15.240000   15.240000   35702500.0
    2024-02-23  ZS  15.150000   15.290000   14.950000   15.130000   15.130000   22914900.0
    2024-02-26  ZS  15.130000   15.480000   15.130000   15.280000   15.280000   23675800.0
...    ...      ...     ...          ...      ...     ...   ...    
    2024-02-20  AAPL    14.500000   14.950000   14.490000   14.700000   14.700000   30253100.0
    2024-02-21  AAPL    14.590000   14.860000   14.570000   14.790000   14.790000   23032400.0
    2024-02-22  AAPL    14.940000   15.280000   14.890000   15.240000   15.240000   35702500.0
    2024-02-23  AAPL    15.150000   15.290000   14.950000   15.130000   15.130000   22914900.0
    2024-02-26  AAPL    15.130000   15.480000   15.130000   15.280000   15.280000   23675800.0

package versions:

!pip show pandas.  #1.5.3
!pip show beautifulsoup4  #4.12.3
!pip show pandas-datareader  #0.10.0

Could you please assist on how to accomplish this?


Solution

  • Seems like you're adding a stock name in the wrong place and at the wrong time. This is the DataFrame returned by web.DataReader where you have to add the stock name either like data["stock_name"] = i or with the DataFrame.insert method to place the new column on the left:

           ...
           # fetch information by stock name
           data = web.DataReader(i,start,end)
    
           # insert the stock name to the left of the data
           data.insert(0, "stock_name", i)
    
           # append rows to empty dataframe
           sic_market_df = sic_market_df.append(data)
           ...
    

    Also, it looks reasonable to use pandas.concat instead of the deprecated append method. For example:

    def get_data(stock_names, start, end):
        for stock in tqdm(stock_names):
            try:
                yfin.pdr_override()
                df = web.DataReader(stock, start, end)
                df.insert(0, "stock_name", stock)
                yield df
            except KeyError:
                pass
            
    sic_market_df = pd.concat(get_data(stock_names, start, end))