I am attempting to create a multi index dataframe which contains every possible index even ones where it does not currently contain values. I wish to set these non-existent values to 0. To achieve this, I used the following:
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)
The expected result:
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|Retail| |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
| |____________|____________|______________|_______________|_____________|_____________|
| | Season | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
| | |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
This result is produced when at least one of the index levels contains a value. However, if the index level does not contain any value, then the following result is produced below.
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Not Special| Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
|Retail|____________|____________|______________|_______________|_____________|_____________|
| | Season |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
For some reason, the values continue to be autotruncated. How can I fix indices so that the desired result is always produced and I can always reliably use these indices for calculations, even when said indices have no values in them?
What you can do is construct the desired fixed index beforehand. For instance, based on a dictionary where the keys are the columns labels used as group index, and the values are all the possible outcomes.
index_levels = {
'Channel': ['Retails'],
'Duration': ['Month', 'Season'],
'Designation': ['Special', 'Not Special'],
'Manufacturing Class': ['Brand', 'Generic']
}
fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())
Then you can do
grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)