Search code examples
pythonpandasdataframeconstraintsnormalization

Python: Constraint to normalize weights, such that no weight is greater than 1/sqrt(n)


I have the following dataframe:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df:

    a   b
 0  1   3
 1  2   4

I have a sample size N=5. I want to normalize the weights in the dataframe using

df.div(df.sum(axis=1), axis=0)

and enforce a constraint, such that none of the weights are greater than 1/sqrt(N).

Can this be done in one line?


Solution

  • To normalize and ensure that no value is greater than a reference, you need to get the max of the normalized values and normalize again:

    import numpy as np
    N = 5 # 1/np.sqrt(N) = 0.447214
    df2 = df.div(df.sum(axis=1), axis=0)
    df2 = df2.div(df2.values.max()*np.sqrt(N))
    

    Output:

              a         b
    0  0.149071  0.447214
    1  0.198762  0.397523
    

    This is two steps, two lines as the second step depends on the first one.

    Can you do it in one line? Yes, but should you?

    By performing the same computation twice: inefficient

    N = 5
    df2 = df.div(df.sum(axis=1), axis=0).div(df.div(df.sum(axis=1), axis=0).values.max()*np.sqrt(N))
    

    By using an assignment expression: not as readable

    N = 5
    df2 = (df2:=df.div(df.sum(axis=1), axis=0)).div(df2.values.max()*np.sqrt(N))
    

    I would stick with the two lines