I have DataFrame in Python Pandas like below:
Data types:
MONTH_NR - numeric
MONTH_NAME - object
VALUE - numeric
MONTH_NR | MONTH_NAME | VALUE |
---|---|---|
1 | JANUARY | 10 |
2 | FEBRYARY | 20 |
3 | MARCH | 15 |
4 | APRIL | 10 |
5 | MAY | 11 |
6 | JUN | 100 |
7 | JULY | 200 |
8 | AUGUST | 12 |
9 | SEPTEMBER | 20 |
10 | OCTOBER | 50 |
11 | NOVEMBER | 30 |
12 | DECEMBER | 50 |
And I need to add 3 new ROWS where will be:
So as a result I need somethin like below:
MONTH_NR | MONTH_NAME | VALUE
---------|-------------|---------
1 | JANUARY | 10
2 | FEBRYARY | 20
3 | MARCH | 15
4 | APRIL | 10
5 | MAY | 11
6 | JUN | 100
SUM_AFTER_1_6| | 166
7 | JULY | 200
8 | AUGUST | 12
9 | SEPTEMBER | 20
10 | OCTOBER | 50
11 | NOVEMBER | 30
12 | DECEMBER | 50
SUM_AFTER_7_12| | 362
SUM_ALL | | 528
How can I do that in Python Pandas ?
Here a nested list aaa
is created in which the name and two indexes for the range in which the sums will be turn out through loc
(note that slices in loc are taken inclusive. For example df.loc[0:5, 'VALUE']
would take rows 0 through 5, that is six rows.).
Further, in the List comprehension
, a nested list is also created where the name, an empty string and the sum in each value inside the list. With np.insert
, rows are inserted and the dataframe is overwritten and the original column names are set.
import pandas as pd
import numpy as np
df = pd.read_csv('df1.csv', header=0)
aaa = [['SUM_AFTER_1_6', 0, 5], ['SUM_AFTER_7_12', 6, 11], ['SUM_ALL', 0, 11]]
bbb = [[aaa[i][0], '', df.loc[aaa[i][1]:aaa[i][2], 'VALUE'].sum()] for i in range(len(aaa))]
df = pd.DataFrame(np.insert(df.values, [6, 12, 12], values=bbb, axis=0))
df.rename(columns={0: 'MONTH_NR', 1: 'MONTH_NAME', 2: 'VALUE'}, inplace=True)
print(df)
Output
MONTH_NR MONTH_NAME VALUE
0 1 JANUARY 10
1 2 FEBRYARY 20
2 3 MARCH 15
3 4 APRIL 10
4 5 MAY 11
5 6 JUN 100
6 SUM_AFTER_1_6 166
7 7 JULY 200
8 8 AUGUST 12
9 9 SEPTEMBER 20
10 10 OCTOBER 50
11 11 NOVEMBER 30
12 12 DECEMBER 50
13 SUM_AFTER_7_12 362
14 SUM_ALL 528