Search code examples
pythonpandasdataframeseries

Multiply dataframe with series having index duplicates and excluding one column


A shortened version of my dataframe looks like this:

df_crop = pd.DataFrame({
    'Name' : ['Crop1', 'Crop1', 'Crop1', 'Crop1', 'Crop2', 'Crop2', 'Crop2', 'Crop2'],
    'Type' : ['Area', 'Diesel', 'Fert', 'Pest', 'Area', 'Diesel', 'Fert', 'Pest'],
    'GHG':   [14.9, 0.0007, 0.145, 0.1611, 2.537, 0.011, 0.1825, 0.115],
    'Acid':  [0.0125, 0.0005, 0.0029, 0.0044, 0.013, 0.00014, 0.0033, 0.0055],
    'Terra Eutro': [0.053, 0.0002, 0.0077, 0.0001, 0.0547, 0.00019, 0.0058, 0.0002]
})

I now need to normalise all values in the dataframe with yield, which is different per crop, but not per Type:

s_yield = pd.Series([0.388, 0.4129], 
                    index=['Crop1', 'Crop2'])

I need to preserve the information in 'Type'. If I try to use .mul() I receive an error due to the duplicated indices: ValueError: cannot reindex from a duplicate axis.

The only other idea I have is using .loc() but I have a lot of columns (16 with values to normalise) and nothing efficient came to mind. Any suggestions?

Edit: The following table might help to show what I try to achieve: enter image description here


Solution

  • Set the index for df_crop, and multiply with the series, aligning on the relevant level:

    temp = df_crop.set_index(['Name', 'Type'])
    
    temp.mul(s_yield, level='Name', axis = 0).reset_index()
    
        Name    Type       GHG      Acid  Terra Eutro
    0  Crop1    Area  5.781200  0.004850     0.020564
    1  Crop1  Diesel  0.000272  0.000194     0.000078
    2  Crop1    Fert  0.056260  0.001125     0.002988
    3  Crop1    Pest  0.062507  0.001707     0.000039
    4  Crop2    Area  1.047527  0.005368     0.022586
    5  Crop2  Diesel  0.004542  0.000058     0.000078
    6  Crop2    Fert  0.075354  0.001363     0.002395
    7  Crop2    Pest  0.047483  0.002271     0.000083