Search code examples
pythonpandasaggregation

Filling aggregated column in Python


Using the input below as an example, I am trying to create an aggregated column in a dataframe in Python based on unique instances of others. The best attempt I can make leaves some NaN in the new column though

raw_data = {'RegionCode' : ['10001', '10001', '10001', '10001', '10001', '10001', '10002', '10002', '10002', '10002', '10002', '10002'],
            'Stratum' : ['1', '1','2','2','3', '3', '1', '1', '2', '2', '3', '3'],
            'LaStratum' : ['1021', '1021', '1022', '1022', '1023', '1023', '2021', '2021', '2022', '2022', '2023', '2023'],
            'StratumPop' : [125, 125, 50, 50, 100, 100, 250, 250, 200, 200, 300, 300],
            'Q_response' : [2, 1, 4, 1, 2, 2, 3, 4, 3, 2, 1, 4]}



Data = pd.DataFrame(raw_data, columns = ['RegionCode', 'Stratum', 'LaStratum', 'StratumPop', 'Q_response'])

#Sum StratumPop by unique instance of LaStratum at RegionCode level
Data['Total_Pop'] = Data.drop_duplicates(['LaStratum']).groupby('RegionCode')['StratumPop'].transform('sum')

Data

What I am trying to do is sum the StratumPop column at RegionCode level by each unique instance of LaStratum. The totals produced are correct but how can I 'fill' the column to repeat each total instead of just seeing the first occurence of each different total and NaN for the others? So Region 10001 has 275 on every row and Region 10002 has 750 on each row. Is this possible without creating staging tables and merging unique values back in (as I'm currently doing)?


Solution

  • To fill the column and repeat each Total_Pop per Region, you can use a simple grouped (by Region per se) ffill():

    Data['Total_Pop_new'] = Data.groupby('RegionCode')['Total_Pop'].ffill()
    

    Will give you back:

    Data
    
       RegionCode Stratum LaStratum  ...  Q_response  Total_Pop  Total_Pop_new
    0       10001       1      1021  ...           2      275.0          275.0
    1       10001       1      1021  ...           1        NaN          275.0
    2       10001       2      1022  ...           4      275.0          275.0
    3       10001       2      1022  ...           1        NaN          275.0
    4       10001       3      1023  ...           2      275.0          275.0
    5       10001       3      1023  ...           2        NaN          275.0
    6       10002       1      2021  ...           3      750.0          750.0
    7       10002       1      2021  ...           4        NaN          750.0
    8       10002       2      2022  ...           3      750.0          750.0
    9       10002       2      2022  ...           2        NaN          750.0
    10      10002       3      2023  ...           1      750.0          750.0
    11      10002       3      2023  ...           4        NaN          750.0