Search code examples
pythonpython-3.xpandasnumpyfinance

Finding the max of numbers in multiple DataFrames Python


I have 1000+ .txt files with stock dates and prices that I've cast to a dictionary (with filename(stock ticker) as the key, and the data for each file as a data frame). I calculated the moving average with .rolling, then found the percent difference between the moving average and the price. So, the percent difference is its own column for each DataFrame. The Code for all this looks like this:

filelist = os.listdir(r'Insert File Path')
filepath = r'Insert File Path'


dic1 = {}

for file in filelist:
    df = pd.read_csv(filepath + file,sep='\t')
dic1[file]= df

for value in dic1.values():
    value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)

for value in dic1.values():
    value['ma'] = value['Prices'].rolling(window=50).mean()

for value in dic1.values():
    value['diff'] = value['Prices'] - value['ma']

for value in dic1.values():
     value['pctdiff']= value['diff']/value['Prices']

My question is how do I find the top 5 greatest (and smallest, because they can be negative) of the pctdiff columns?

I've tried:

for df in dic1.values():
    for num in df['pctdiff'].max():
        print(num.max())

but I get the following error: "'float' object is not iterable"


Solution

  • Is this what you mean?

    list_result = []
    for key,value in dic1.items():
        value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)
        value['ma'] = value['Prices'].rolling(window=50).mean()
        value['diff'] = value['Prices'] - value['ma']
        value['pctdiff']= value['diff']/value['Prices']
        list_result.append([key,value['pctdiff'].max()])
    
    list_result.sort(key = lambda x : x[1] )
    highest_list = list_result[-5:]
    smallest_list = list_result[:5]