Search code examples
pythonpandassparse-matrixmultiplicationmulti-index

How to correctly match pandas multiindex dataframe multiplication for sparse data


I've searched before posting this, I've found among others this previous stack overflow post and I don't think it answers my question.

I have sparse data that I want to multiply together correctly matched by index, where the data is a multilevel index.

I have observations of different attributes for a number of element_ids on different dates, but the data is sparse:

enter image description here

This is my second array df_weight_at_date a list of weights for each element_id (python to create at bottom of post)

enter image description here

For each date, I want to multiply values together, so for example in my observed data A/1/2021-01-15 (0.87) should be multiplied by weight at date 1/2021-01-15 (0.3) for a value of 0.261

If either value is NaN then the result is NaN and the output frame will have the same shape as the df_observations dataframe.

I've tried using .multiply but get the error no ValueError: cannot join with no overlapping index names

df_observations.multiply(df_weight_at_date.unstack())

Expected output for this data

enter image description here

Bit of a newbie - would appreciate any pointers, thanks

code to create data frames


df_observations=pd.DataFrame({'observed_date':['2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-15','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16','2021-01-16'],
               'element_id':[1,2,3,4,5,6,7,1,2,3,4,5,6,7,1,2,3,2,3,4,5,6,7,3,2,3,4,5,6,7],
                'factor_id':['A','A','A','A','A','A','A','B','B','B','B','B','B','B','C','C','C','A','A','A','A','A','A','F','F','B','B','B','B','B'],
               'observation':[0.87,0.84,0.15,0.6,0.17,0.76,0.03,0.91,0.05,0.38,0.06,0.27,0.92,0.27,0.16,0.71,0.32,0.92,0.88,0.53,0.79,0.15,0.3,0.16,0.36,0.05,0.22,0.73,0.7,0.9]}).pivot(index=['observed_date','element_id'], columns='factor_id', values='observation')


df_weight_at_date=pd.DataFrame({'observed_date':['2021-01-15','2021-01-15','2021-01-15',
                                 '2021-01-16','2021-01-17','2021-01-18',
                                 '2021-01-19','2021-01-20','2021-01-18'
                                ],
               'element_id':[1,3,5,1,3,5,1,3,9],
               'weight':[0.3,0.35,0.35,1,1,0.4,1,1,0.6]}).pivot(index=['element_id'], columns='observed_date', values='weight')

Solution

  • After correcting the input frames so that index names match (observation_date -> observed_date) this now works and is concise enough I think

    df_observations.multiply(df_weight_at_date.unstack(), axis=0)
    

    result enter image description here