Search code examples
pythonrowaddition

How to add rows in a dataframe by a column condition python


I have a dataframe like that:

df = pd.DataFrame({'year': [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],

'month': [1,2,3,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5],
'client':[1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3],
'total':[10,20,30,55,4,64,88,5,64,32,84,24,69,70,54,11,37,98,52]})
 df

    year  month  client  total
0   2022      1       1     10
1   2022      2       1     20
2   2022      3       1     30
3   2022      1       2     55
4   2022      2       2      4
5   2022      3       2     64
6   2022      4       2     88
7   2022      5       2      5
8   2022      6       2     64
9   2022      7       2     32
10  2022      8       2     84
11  2022      9       2     24
12  2022     10       2     69
13  2022     11       2     70
14  2022      1       3     54
15  2022      2       3     11
16  2022      3       3     37
17  2022      4       3     98
18  2022      5       3     52

I would like to have for all the clients 12 months, so I need to add this rows and assign the value 0 to the column total. For this new rows, the value of the year column is a copy of the rows that have filled the month up.

Desire output:

df = pd.DataFrame({'year': [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,20 
 22,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],

'month': [1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
'client':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3],
'total': 
[10,20,30,0,0,0,0,0,0,0,0,0,55,4,64,88,5,64,32,84,24,69,70,0,54,11,37,98,52,0,0,0,0,0,0,0]})
df
    year  month  client  total  
0   2022      1       1     10
1   2022      2       1     20
2   2022      3       1     30
3   2022      4       1      0
4   2022      5       1      0
5   2022      6       1      0
6   2022      7       1      0
7   2022      8       1      0
8   2022      9       1      0
9   2022     10       1      0
10  2022     11       1      0
11  2022     12       1      0
12  2022      1       2     55
13  2022      2       2      4
14  2022      3       2     64
15  2022      4       2     88
16  2022      5       2      5
17  2022      6       2     64
18  2022      7       2     32
19  2022      8       2     84
20  2022      9       2     24
21  2022     10       2     69
22  2022     11       2     70
23  2022     12       2      0
24  2022      1       3     54
25  2022      2       3     11
26  2022      3       3     37
27  2022      4       3     98
28  2022      5       3     52
29  2022      6       3      0
30  2022      7       3      0
31  2022      8       3      0
32  2022      9       3      0
33  2022     10       3      0
34  2022     11       3      0
35  2022     12       3      0

Maybe by the merge option there is a solution? I've tried but without a good result


Solution

  • It's not the easiest way but it works with pivot_table:

    out = (df.pivot_table(index=['year', 'client'], columns='month',
                          values='total', fill_value=0.)
             .reindex(columns=range(1, 13), fill_value=0.)
             .stack().swaplevel('month', 'client')
             .rename('total').reset_index())
    

    Output:

    >>> out
        year  month  client  total
    0   2022      1       1   10.0
    1   2022      2       1   20.0
    2   2022      3       1   30.0
    3   2022      4       1    0.0
    4   2022      5       1    0.0
    5   2022      6       1    0.0
    6   2022      7       1    0.0
    7   2022      8       1    0.0
    8   2022      9       1    0.0
    9   2022     10       1    0.0
    10  2022     11       1    0.0
    11  2022     12       1    0.0
    12  2022      1       2   55.0
    13  2022      2       2    4.0
    14  2022      3       2   64.0
    15  2022      4       2   88.0
    16  2022      5       2    5.0
    17  2022      6       2   64.0
    18  2022      7       2   32.0
    19  2022      8       2   84.0
    20  2022      9       2   24.0
    21  2022     10       2   69.0
    22  2022     11       2   70.0
    23  2022     12       2    0.0
    24  2022      1       3   54.0
    25  2022      2       3   11.0
    26  2022      3       3   37.0
    27  2022      4       3   98.0
    28  2022      5       3   52.0
    29  2022      6       3    0.0
    30  2022      7       3    0.0
    31  2022      8       3    0.0
    32  2022      9       3    0.0
    33  2022     10       3    0.0
    34  2022     11       3    0.0
    35  2022     12       3    0.0
    

    After the pivot your dataframe looks like:

    >>> (df.pivot_table(index=['year', 'client'], columns='month',
                        values='total', fill_value=0)
           .reindex(columns=range(1, 13), fill_value=0))
    
    month          1     2     3     4     5     6     7     8     9     10    11  12
    year client                                                                      
    2022 1       10.0  20.0  30.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0
         2       55.0   4.0  64.0  88.0   5.0  64.0  32.0  84.0  24.0  69.0  70.0   0
         3       54.0  11.0  37.0  98.0  52.0   0.0   0.0   0.0   0.0   0.0   0.0   0