Using the input below as an example, I am trying to create an aggregated column in a dataframe in Python based on unique instances of others. The best attempt I can make leaves some NaN in the new column though
raw_data = {'RegionCode' : ['10001', '10001', '10001', '10001', '10001', '10001', '10002', '10002', '10002', '10002', '10002', '10002'],
'Stratum' : ['1', '1','2','2','3', '3', '1', '1', '2', '2', '3', '3'],
'LaStratum' : ['1021', '1021', '1022', '1022', '1023', '1023', '2021', '2021', '2022', '2022', '2023', '2023'],
'StratumPop' : [125, 125, 50, 50, 100, 100, 250, 250, 200, 200, 300, 300],
'Q_response' : [2, 1, 4, 1, 2, 2, 3, 4, 3, 2, 1, 4]}
Data = pd.DataFrame(raw_data, columns = ['RegionCode', 'Stratum', 'LaStratum', 'StratumPop', 'Q_response'])
#Sum StratumPop by unique instance of LaStratum at RegionCode level
Data['Total_Pop'] = Data.drop_duplicates(['LaStratum']).groupby('RegionCode')['StratumPop'].transform('sum')
Data
What I am trying to do is sum the StratumPop column at RegionCode level by each unique instance of LaStratum. The totals produced are correct but how can I 'fill' the column to repeat each total instead of just seeing the first occurence of each different total and NaN for the others? So Region 10001 has 275 on every row and Region 10002 has 750 on each row. Is this possible without creating staging tables and merging unique values back in (as I'm currently doing)?
To fill the column and repeat each Total_Pop per Region, you can use a simple grouped (by Region per se) ffill()
:
Data['Total_Pop_new'] = Data.groupby('RegionCode')['Total_Pop'].ffill()
Will give you back:
Data
RegionCode Stratum LaStratum ... Q_response Total_Pop Total_Pop_new
0 10001 1 1021 ... 2 275.0 275.0
1 10001 1 1021 ... 1 NaN 275.0
2 10001 2 1022 ... 4 275.0 275.0
3 10001 2 1022 ... 1 NaN 275.0
4 10001 3 1023 ... 2 275.0 275.0
5 10001 3 1023 ... 2 NaN 275.0
6 10002 1 2021 ... 3 750.0 750.0
7 10002 1 2021 ... 4 NaN 750.0
8 10002 2 2022 ... 3 750.0 750.0
9 10002 2 2022 ... 2 NaN 750.0
10 10002 3 2023 ... 1 750.0 750.0
11 10002 3 2023 ... 4 NaN 750.0