Search code examples
pythonpandasdataframeapply

Applying function to Pandas Data frame and getting result as a new data frame


I have this code:

import yfinance as yF
import datetime
import pandas as pd

df = pd.DataFrame({'ID':['1', '2'], 'Ticker': ['AIN', 'TILE'], 'Company':['Albany International', 'Interface']})


def get_returns(tic,com):
    df_Stock = yF.download(tickers = tic, period = "max", interval = "1mo", prepost = False, repair = False)
    df_Stock[com + ' % Growth'] = df_Stock['High']-df_Stock['Open']
    df_Stock[com + ' % Growth'] = (df_Stock[com + ' % Growth'] * 100 )/df_Stock['Open']
    
    return df_Stock[com + ' % Growth']

get_returns('AIN','AIN')

Everything work as expected until this point.

df1 = df.apply(lambda x: get_returns(x.Ticker, x.Company), axis=1)

Here I am trying to use this function

get_returns()

with Pandas apply and lambda on the dataframe df defined above. The desired output is another dataframe df1 which has 3 columns namely

  1. Date
  2. Company 1 - Albany International % Returns
  3. Company 2 - Interface % Returns

If there are more rows in dataframe df, each company will be a new column in df1 and its monthly return would be the row value for the given time period under the Date column.


Solution

  • This is the output I got when running your code for the following data frame

    ID Ticker Company
    0 1 AIN Albany International
    1 2 TILE Interface

    Output

    Date 1985-01-01 1985-02-01 1985-03-01 1985-04-01 1985-05-01 1985-06-01 1985-07-01 1985-08-01 1985-09-01 1985-10-01 ... 2023-01-01 2023-02-01 2023-03-01 2023-04-01 2023-05-01 2023-06-01 2023-07-01 2023-08-01 2023-09-01 2023-10-01
    0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 13.081953 3.767988 1.609221 3.527832 3.188085 11.953453 5.196635 0.891469 1.352506 0.290596
    1 inf inf inf inf inf inf inf inf inf inf ... 15.577892 8.091470 5.604387 1.231532 2.809710 27.745666 12.328766 8.771934 3.567984 1.530608

    I believe that the code that you're running creates an entry for each index in the original data frame. So row 0 corresponds to 'AIN' and row 1 corresponds to 'TILE'.

    To get it into the desired format you need to transpose df1 and rename the columns based on df.

    df1 = df1.T
    df1.rename(columns=df['Company'] + ' % Returns', inplace=True)
    df1.rename(columns={v: f"Company {k+1} - {v}" for k, v in enumerate(df1.columns)}, inplace=True)
    

    Output:

    Date Company 1 - Albany International % Returns Company 2 - Interface % Returns
    1985-01-01 NaN inf
    1985-02-01 NaN inf
    1985-03-01 NaN inf
    1985-04-01 NaN inf
    1985-05-01 NaN inf
    ... ... ...
    2023-06-01 11.953453 27.745666
    2023-07-01 5.196635 12.328766
    2023-08-01 0.891469 8.771934
    2023-09-01 1.352506 3.567984
    2023-10-01 0.290596 1.530608