pandas dataframe numpy for-loop pandas-groupby

Specific calculations for unique column values in DataFrame

I want to make a beta calculation in my dataframe, where beta = Σ(daily returns - mean daily return) * (daily market returns - mean market return) / Σ (daily market returns - mean market return)**2

But I want my beta calculation to apply to specific firms. In my dataframe, each firm as an ID code number (specified in column 1), and I want each ID code to be associated with its unique beta.

I tried groupby, loc and for loop, but it seems to always return an error since the beta calculation is quite long and requires many parenthesis when inserted.

Any idea how to solve this problem? Thank you!

Dataframe:

index ID price  daily_return  mean_daily_return_per_ID   daily_market_return    mean_daily_market_return     date
0     1   27.50   0.008         0.0085                 0.0023                   0.03345                      01-12-2012
1     2   33.75   0.0745        0.0745                 0.00458                  0.0895                       06-12-2012 
2     3   29,20   0.00006       0.00006                0.0582                   0.0045                       01-05-2013
3     4   20.54   0.00486       0.005125               0.0009                   0.0006                       27-11-2013
4     1   21.50   0.009         0.0085                 0.0846                   0.04345                      04-05-2014
5     4   22.75   0.00539       0.005125               0.0003                   0.0006

Solution

I assume the following form of your equation is what you intended.

Then the following should compute the beta value for each group identified by ID.

Method 1: Creating our own function to output beta

import pandas as pd
import numpy as np

# beta_data.csv is a csv version of the sample data frame you
# provided.
df = pd.read_csv("./beta_data.csv")


def beta(daily_return, daily_market_return):
    """
    Returns the beta calculation for two pandas columns of equal length.
    Will return NaN for columns that have just one row each. Adjust
    this function to account for groups that have only a single value.
    """
    mean_daily_return = np.sum(daily_return) / len(daily_return)
    mean_daily_market_return = np.sum(daily_market_return) / len(daily_market_return)
    num = np.sum(
        (daily_return - mean_daily_return)
        * (daily_market_return - mean_daily_market_return)
    )
    denom = np.sum((daily_market_return - mean_daily_market_return) ** 2)
    return num / denom


# groupby the column ID. Then 'apply' the function we created above
# columnwise to the two desired columns
betas = df.groupby("ID")["daily_return", "daily_market_return"].apply(
    lambda x: beta(x["daily_return"], x["daily_market_return"])
)

print(f"betas: {betas}")

Method 2: Using pandas' builtin statistical functions

Notice that beta as stated above is just covarianceof DR and DMR divided by variance of DMR. Therefore we can write the above program much more concisely as follows.

import pandas as pd
import numpy as np

df = pd.read_csv("./beta_data.csv")


def beta(dr, dmr):
    """
    dr: daily_return (pandas columns)
    dmr: daily_market_return (pandas columns)

    TODO: Fix the divided by zero erros etc.
    """
    num = dr.cov(dmr)
    denom = dmr.var()
    return num / denom


betas = df.groupby("ID")["daily_return", "daily_market_return"].apply(
    lambda x: beta(x["daily_return"], x["daily_market_return"])
)

print(f"betas: {betas}")

The output in both cases is.

ID
1    0.012151
2         NaN
3         NaN
4   -0.883333
dtype: float64

The reason for getting NaNs for IDs 2 and 3 is because they only have a single row each. You should modify the function beta to accomodate these corner cases.