Search code examples
pythonpandasnumpysumjupyter

Create a new variable in flat tables based on existing column


I have got the following dataframe:

lst=[['01012021','A',5],['01012021','B',-10],['01012021','C',20],['01022021','A',20],['01022021','B',-30],['01022021','C',40]]
df2=pd.DataFrame(lst,columns=['Date','Art','AuM'])

I would like to create in column Art a new variable (D) by adding A, B and C. It should be grouped by the columns date and Art. The result looks like this:

lst=[['01012021','A',5],['01012021','B',-10],['01012021','C',20],['01012021','D',15],['01022021','A',20],['01022021','B',-30],['01022021','C',40],['01022021','D',30]]
df2=pd.DataFrame(lst,columns=['Date','Art','AuM'])

Thank you for your help.


Solution

  • You can groupby Date, assign 'D' as a new column Art to the result and append to the orginal dataframe:

    df2.append(df2.groupby('Date', as_index=False).AuM.sum().assign(Art='D'))
    

    If needed, you can sort by appending .sort_values(['Date', 'Art']):

           Date Art  AuM
    0  01012021   A    5
    1  01012021   B  -10
    2  01012021   C   20
    0  01012021   D   15
    3  01022021   A   20
    4  01022021   B  -30
    5  01022021   C   40
    1  01022021   D   30