I am trying to import an excel spreadsheet in pandas but I have some issue due to data being "structured visually by user" to be user friendly visually and not using proper data structures.
As a result, when imported in pandas as dataframe I get this:
>>> df
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 Unnamed: 11 Unnamed: 12
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN ROUTER 1 NaN NaN ROUTER 2 NaN NaN ROUTER 3 NaN NaN
4 NaN NaN NaN SERVERS IP NETMASK GATEWAY IP NETMASK GATEWAY IP NETMASK GATEWAY
5 NaN NaN NaN SERVER 1 10.0.0.10 255.255.255.0 10.0.0.1 10.1.0.10 255.255.255.0 10.0.0.1 10.2.0.10 255.255.255.0 10.0.0.1
6 NaN NaN NaN SERVER 2 10.0.1.10 255.255.255.0 10.0.1.1 10.1.1.10 255.255.255.0 10.0.1.1 10.2.1.10 255.255.255.0 10.0.1.1
What I want to achieve as desired output
is this:
>>> result
ROUTER 1 ROUTER 2 ROUTER 3
SERVERS IP NETMASK GATEWAY IP NETMASK GATEWAY IP NETMASK GATEWAY
SERVER 1 10.0.0.10 255.255.255.0 10.0.0.1 10.1.0.10 255.255.255.0 10.0.0.1 10.2.0.10 255.255.255.0 10.0.0.1
SERVER 2 10.0.1.10 255.255.255.0 10.0.1.1 10.1.1.10 255.255.255.0 10.0.1.1 10.2.1.10 255.255.255.0 10.0.1.1
where I get proper multi-indexing of columns (which is exactly what I have in Excel..)
((ROUTER 1, [ IP, NETMASK, GATEWAY]), (ROUTER 2, [ IP, NETMASK, GATEWAY]), (ROUTER 3, [ IP, NETMASK, GATEWAY]))
My attempts so far are rather poor.. I got stuck here..
>>> pd.read_excel('Test.xlsx',header=3).dropna(axis=1, how='all').stack().unstack(0).reset_index(drop=True).fillna(method='ffill')
0 1 2 3
0 NaN SERVERS SERVER 1 SERVER 2
1 ROUTER 1 IP 10.0.0.10 10.0.1.10
2 ROUTER 1 NETMASK 255.255.255.0 255.255.255.0
3 ROUTER 1 GATEWAY 10.0.0.1 10.0.1.1
4 ROUTER 2 IP 10.1.0.10 10.1.1.10
5 ROUTER 2 NETMASK 255.255.255.0 255.255.255.0
6 ROUTER 2 GATEWAY 10.0.0.1 10.0.1.1
7 ROUTER 3 IP 10.2.0.10 10.2.1.10
8 ROUTER 3 NETMASK 255.255.255.0 255.255.255.0
9 ROUTER 3 GATEWAY 10.0.0.1 10.0.1.1
Thank you so much for your help in advance.
Combining what you've tried with what you've been advised by @Mayank, here is my 2 cents:
>>> pd.read_excel('Test.xls', skiprows=3, usecols="D:M")
.stack()
.unstack(0)
.reset_index(drop=True)
.fillna(method='ffill')
.pivot_table(columns=[0,1] aggfunc='first')
I think that should give you what you're looking for.