Search code examples
pythonpandasspatial-data-frame

Python Pandas: Spread total values in dataframe based on percentages back to categories


I have the below dataframe and I would like to spread the total values in the val columns back to the Cat column based on the percentages in the Percentage column.

    Country State   City    Cat Total amount    Percentage  val1    val2    val3    val4

    US        FL    MIA     cat1     100           10 
    US        FL    MIA     cat2     850           85
    US        FL    MIA     cat3     50            5
    US        FL    MIA     tot cat  1000          100      200       30    400     120
    US        GA    ATL     cat1     200           40       
    US        GA    ATL     cat2     300           60       
    US        GA    ATL     tot cat  500           100      100       50    20      30
    US        NY    NY      tot cat  100           100      0         20    5       15
    Canada    MB    WPG     cat1     250           50
    Canada    MB    WPG     cat2     250           50
    Canada    MB    WPG     tot cat  500           100      50        550   40      160
    Canada    QC    YUL     cat1     500           33,33
    Canada    QC    YUL     cat2     1000          66,66
    Canada    QC    YUL     tot cat  1500          100      1000      250   500     60

I got the total line in, also I managed to put in the percentages per group and merging two dataframes based on a few columns went fine but now im stuck...

Desired output:

    Country State   City    Cat Total amount    Percentage  val1    val2    val3    val4

    US        FL    MIA     cat1     100           10       20        3     40      12
    US        FL    MIA     cat2     850           85       170       25,5  340     102
    US        FL    MIA     cat3     50            5        10        1,5   20      6
    US        FL    MIA     tot cat  1000          100      200       30    400     120
    US        GA    ATL     cat1     200           40       40        20    8       12      
    US        GA    ATL     cat2     300           60       60        30    12      18      
    US        GA    ATL     tot cat  500           100      100       50    20      30
    US        NY    NY      tot cat  100           100      0         20    5       15
    Canada    MB    WPG     cat1     250           50       25        275   20      80
    Canada    MB    WPG     cat2     250           50       25        275   20      80
    Canada    MB    WPG     tot cat  500           100      50        550   40      160
    Canada    QC    YUL     cat1     500           33,33    333,33    83,32 166,65  19,99   
    Canada    QC    YUL     cat2     1000          66,66    666,66    166,65333,33  39,99
    Canada    QC    YUL     tot cat  1500          100      1000      250   500     60

Or should I calculated / spread the values when I do the merge? I need help...

EDIT: I provided the input data.

    pd.DataFrame([{'Country': 'US', 'State': 'FL', 'City': 'MIA', 'Cat': 'cat1', 'Total Amount': 100, 'Percentage': 10 }, 
        {'Country': 'US', 'State': 'FL', 'City': 'MIA', 'Cat': 'cat2','Total Amount': 850, 'Percentage': 85 },
        {'Country': 'US', 'State': 'FL', 'City': 'MIA', 'Cat': 'cat3','Total Amount': 50, 'Percentage': 5 },
        {'Country': 'US', 'State': 'FL', 'City': 'MIA', 'Cat': 'Tot Cat','Total Amount': 1000, 'Percentage': 100, 'val1': 200, 'val2': 30, 'val3': 400, 'val4': 120},
        {'Country': 'US', 'State': 'GA', 'City': 'ATL', 'Cat': 'cat1','Total Amount': 200, 'Percentage': 40 },
        {'Country': 'US', 'State': 'GA', 'City': 'ATL', 'Cat': 'cat2','Total Amount': 300, 'Percentage': 60 },
        {'Country': 'US', 'State': 'GA', 'City': 'ATL', 'Cat': 'Tot Cat','Total Amount': 500, 'Percentage': 100, 'val1': 100, 'val2': 50, 'val3': 20, 'val4': 30},
        {'Country': 'US', 'State': 'NY', 'City': 'ATL', 'Cat': 'Tot Cat','Total Amount': 100, 'Percentage': 100, 'val1': 0, 'val2': 20, 'val3': 5, 'val4': 15},
        {'Country': 'Canada', 'State': 'MB', 'City': 'WPG', 'Cat': 'cat1', 'Total Amount': 250, 'Percentage': 50 },
        {'Country': 'Canada', 'State': 'MB', 'City': 'WPG', 'Cat': 'cat2', 'Total Amount': 250, 'Percentage': 50 },
        {'Country': 'Canada', 'State': 'MB', 'City': 'WPG', 'Cat': 'Tot Cat', 'Total Amount': 500, 'Percentage': 100, 'val1': 50 , 'val2': 550, 'val3': 40, 'val4': 160},
        {'Country': 'Canada', 'State': 'QC', 'City': 'YUL', 'Cat': 'cat1', 'Total Amount': 500, 'Percentage': 33.33 },
        {'Country': 'Canada', 'State': 'QC', 'City': 'YUL', 'Cat': 'cat2', 'Total Amount': 1000, 'Percentage': 66.66 },
        {'Country': 'Canada', 'State': 'QC', 'City': 'YUL', 'Cat': 'Tot Cat', 'Total Amount': 1500, 'Percentage': 100, 'val1': 1000 , 'val2': 250, 'val3': 500, 'val4': 60 }])

Solution

  • Don't know if you can do this more elegantly, but you could do it like this:

    >>> df_tot = df[df['Cat'] == 'tot cat'].reindex(index=df.index, method='backfill')
    >>> for x in df.columns:
            if 'val' in x:
                df[x] = df['Percentage'] * df_tot[x] / 100
    
       Country State City      Cat  Total_amount  Percentage    val1     val2    val3     val4
    0       US    FL  MIA     cat1           100       10.00    20.0    3.000   40.00   12.000
    1       US    FL  MIA     cat2           850       85.00   170.0   25.500  340.00  102.000
    2       US    FL  MIA     cat3            50        5.00    10.0    1.500   20.00    6.000
    3       US    FL  MIA  tot cat          1000      100.00   200.0   30.000  400.00  120.000
    4       US    GA  ATL     cat1           200       40.00    40.0   20.000    8.00   12.000
    5       US    GA  ATL     cat2           300       60.00    60.0   30.000   12.00   18.000
    6       US    GA  ATL  tot cat           500      100.00   100.0   50.000   20.00   30.000
    7       US    NY   NY  tot cat           100      100.00     0.0   20.000    5.00   15.000
    8   Canada    MB  WPG     cat1           250       50.00    25.0  275.000   20.00   80.000
    9   Canada    MB  WPG     cat2           250       50.00    25.0  275.000   20.00   80.000
    10  Canada    MB  WPG  tot cat           500      100.00    50.0  550.000   40.00  160.000
    11  Canada    QC  YUL     cat1           500       33.33   333.3   83.325  166.65   19.998
    12  Canada    QC  YUL     cat2          1000       66.66   666.6  166.650  333.30   39.996
    13  Canada    QC  YUL  tot cat          1500      100.00  1000.0  250.000  500.00   60.000