Search code examples
pythonpandasindexingdataframeindices

I am looking at each individual date in a Pandas dataframe and adjusting one column (weight), based on condition on another column for each date


I have a dataframe (df) with 10 columns. The index has many different dates, however there are multiple identical dates (and it is sorted by date). Additionally the important columns for this problem are df['Weight'] and df['Price'].

Here is an example of the data with 2 columns for only 1 index value (1/21/2017), in reality there are multiple dates with multiple weights etc.

          Weight    Price
1/21/2017   0.1     12
1/21/2017   0.04    21
1/21/2017   0.03    13
1/21/2017   0.02    22
1/21/2017   0.2     27
1/21/2017   0.001   30
1/21/2017   0.1     34
1/21/2017   0.21    21
1/21/2017   0.003   12
1/21/2017   0.01    32
1/21/2017   0.04    21
1/21/2017   0.005   12
1/21/2017   0.05    10
1/21/2017   0.1      3
1/21/2017   0.091   24

The weights add up to 1 for a particular index, and this is true for every unique date of the index.

Now I am trying to create a column df['adjusted weight'] which will be based off the percent rank of each value of Price for a particular date.

The price which is in the middle 80% relative to other prices for a particular date, the df['adjusted weight'] would be equal to df['Weight'].

For the bottom 10% of Prices for a particular date, df['Adjusted weight'], would be df['Weight'] / 2 .

For the top 10% of Prices for any date, We need to adjust the Adjusted Weight so that the new df[adjusted weight'] for that date would now equal 1.

This would be done by dividing all the weights for where Price is in top 10% for a date by the current sum of weight of the top 10% Prices, and then multiplying by this by that number itself + the sum of the weight which was chopped off from the bottom 10%, to make it 1 again.

Here is the output I desire:

          Weight    Price   Percent rank    Adjusted Weight
1/21/2017   0.1     12  0.142   0.1
1/21/2017   0.04    21  0.428   0.04
1/21/2017   0.03    13  0.357   0.03
1/21/2017   0.02    22  0.642   0.02
1/21/2017   0.2     27  0.785   0.2
1/21/2017   0.001   30  0.857   0.001
1/21/2017   0.1     34  1       **0.168181818**
1/21/2017   0.21    21  0.428   0.21
1/21/2017   0.003   12  0.142   0.003
1/21/2017   0.01    32  0.928   **0.016818182**
1/21/2017   0.04    21  0.428   0.04
1/21/2017   0.005   12  0.142   0.005
1/21/2017   0.05    10  0.071   **0.025**
1/21/2017   0.1     3   0       **0.05**
1/21/2017   0.091   24  0.714   0.091

I have put the ** around the values which have changed, and coincedentally there are 2 values at bottom 10% pct rank and 2 values at top 10% rank. For the bottom 2 values, its just weight/2 , and for top 2 I wrote this formula in excel, =E8/(E11+E8)*(E11+E8+E15+E14-(H15+H14)) where tha table spans from D1 to H16.

Its a pretty tricky problem, but I hope to have asked in reasonable manner. If somebody could advise and help I would very much appreciate it. Thank you for reading and love to hear the ideas which I get. Additionally the dataset is large, so I am not sure how long it would take as it would need to do the same work for each date.

I would like for the adjusted weight to be a new column which exist alongside the original weights please. Additionally something which would work with other columns as well as price would be great.

Thanks all so much for the help and support.

Best wishes.


Solution

  • def adjust(df):
        df = df.copy()
        b, t = df.Price.quantile([.1, .9])
        ltb = df.Price.lt(b)
        get = df.Price.ge(t)
        bsum = df.Weight[ltb].sum()
        tsum = df.Weight[get].sum()
        df.loc[ltb, 'Weight'] /= 2
        df.loc[get, 'Weight'] *= (bsum / 2 + tsum) / tsum
        return df
    
    df.groupby(level=0).apply(adjust).reset_index(0, drop=True)
    
                  Weight  Price
    2017-01-21  0.100000     12
    2017-01-21  0.040000     21
    2017-01-21  0.030000     13
    2017-01-21  0.020000     22
    2017-01-21  0.200000     27
    2017-01-21  0.001000     30
    2017-01-21  0.168182     34
    2017-01-21  0.210000     21
    2017-01-21  0.003000     12
    2017-01-21  0.016818     32
    2017-01-21  0.040000     21
    2017-01-21  0.005000     12
    2017-01-21  0.025000     10
    2017-01-21  0.050000      3
    2017-01-21  0.091000     24