Search code examples
pythonpandasmulti-index

Is there a way to remove autotruncation from pandas dataframe?


I am attempting to create a multi index dataframe which contains every possible index even ones where it does not currently contain values. I wish to set these non-existent values to 0. To achieve this, I used the following:

index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']

grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')

grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)

The expected result:

 ___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Special    |    Brand     |     0         |    0.00     |   0.00      |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |    0.00     |   0.00      |
|Retail|            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     756       | 15654.07    |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     7896      |  98745.23   |    78953.56 |
|      |____________|____________|______________|_______________|_____________|_____________|
|      | Season     | Special    |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|      |            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|______|____________|____________|______________|_______________|_____________|_____________|

This result is produced when at least one of the index levels contains a value. However, if the index level does not contain any value, then the following result is produced below.

___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Not Special|    Brand     |     756       |  15654.07   |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |    7896       | 98745.23    |   78953.56  |
|Retail|____________|____________|______________|_______________|_____________|_____________|
|      | Season     |Not Special |    Brand     |       0       |    0.00     |     0.00    |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |       0       |    0.00     |     0.00    |
|______|____________|____________|______________|_______________|_____________|_____________|

For some reason, the values continue to be autotruncated. How can I fix indices so that the desired result is always produced and I can always reliably use these indices for calculations, even when said indices have no values in them?


Solution

  • What you can do is construct the desired fixed index beforehand. For instance, based on a dictionary where the keys are the columns labels used as group index, and the values are all the possible outcomes.

    index_levels = {
        'Channel': ['Retails'], 
        'Duration': ['Month', 'Season'], 
        'Designation': ['Special', 'Not Special'], 
        'Manufacturing Class': ['Brand', 'Generic']
    }
    
    fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())
    

    Then you can do

    grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
    
    grouped_df = grouped_df.reindex(fixed_index, fill_value=0)