Having an excel sheet for sales by product by state by location as below
New Jersey | New Jersey | California | California | California | California | New York | New York | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Willowbrook | Willowbrook | Great Mall | Great Mall | Westminster Mall | Westminster Mall | The Fulton Center | The Fulton Center | |||||
Transaction Date | Transaction Document | Partno | Qty | Value | Qty | Value | Qty | Value | Qty | Value | Total Qty | Total Value |
17/5/2022 | 8534 | P1 | 1 | 150.00 | 1 | 150.00 | ||||||
17/5/2022 | 8654 | P6 | 2 | 400.00 | 2 | 400.00 | ||||||
18/5/2022 | 3537 | P2 | 1 | 175.00 | 1 | 175.00 | ||||||
20/5/2022 | 2301 | P3 | 2 | 150.00 | 2 | 150.00 | ||||||
20/5/2022 | 2301 | P5 | 1 | 50.00 | 1 | 50.00 |
Need to change it to become as below
Transaction Date | Transaction Document | Partno | Qty | Value | State | Location |
---|---|---|---|---|---|---|
17/5/2022 | 8534 | P1 | 1 | 150.00 | California | Great Mall |
17/5/2022 | 8654 | P6 | 2 | 400.00 | California | Westminster Mall |
18/5/2022 | 3537 | P2 | 1 | 175.00 | New Jersey | Willowbrook |
20/5/2022 | 2301 | P3 | 2 | 150.00 | New York | The Fulton Center |
20/5/2022 | 2301 | P5 | 1 | 50.00 | New York | The Fulton Center |
There are monthly reports like this so I was looking for a python function that uses pandas or any other library to be able to pass the files and transform it as above
You can remove last 2 columns by DataFrame.iloc
, create columns names by DataFrame.set_axis
, reshape by DataFrame.stack
:
df = pd.read_excel('data.xlsx', header=[0, 1, 2], index_col=[0, 1, 2])
print (df.index)
MultiIndex([('17/5/2022 ', 8534, 'P1 '),
('17/5/2022 ', 8654, 'P6 '),
('18/5/2022 ', 3537, 'P2 '),
('20/5/2022 ', 2301, 'P3 '),
('20/5/2022 ', 2301, 'P5 ')],
)
print (df.columns)
MultiIndex([( 'New Jersey ', 'Willowbrook ', 'Qty '),
( 'New Jersey ', 'Willowbrook ', ' Value '),
( 'California ', 'Great Mall ', 'Qty '),
( 'California ', 'Great Mall ', ' Value '),
( 'California ', 'Westminster Mall ', 'Qty '),
( 'California ', 'Westminster Mall ', ' Value '),
( 'New York ', 'The Fulton Center ', 'Qty '),
( 'New York ', 'The Fulton Center ', ' Value '),
('Unnamed: 11_level_0', 'Unnamed: 11_level_1', 'Total Qty '),
('Unnamed: 12_level_0', 'Unnamed: 12_level_1', ' Total Value')],
names=[' ', None, 'Transaction Date '])
df1 = (df.iloc[:, :-2]
.rename_axis(columns=['State','Location', None],
index=['Transaction Date','Transaction Document','Partno'])
.stack(level=[0,1])
.reset_index())
print (df1)
Transaction Date Transaction Document Partno State \
0 17/5/2022 8534 P1 California
1 17/5/2022 8654 P6 California
2 18/5/2022 3537 P2 New Jersey
3 20/5/2022 2301 P3 New York
4 20/5/2022 2301 P5 New York
Location Value Qty
0 Great Mall 150.0 1.0
1 Westminster Mall 400.0 2.0
2 Willowbrook 175.0 1.0
3 The Fulton Center 150.0 2.0
4 The Fulton Center 50.0 1.0