I have an excel table (sample.xlsx) which contains 3 sheets ('Sheet1','Sheet2','Sheet3'). Now I have read all the sheets and combine them into one dataframe.
import pandas as pd
data_df = pd.concat(pd.read_excel("sample.xlsx", header=None, index_col=None, sheet_name=None))
data_df looks like this:
0 1 2
Sheet1 0 val1 val2 val3
1 val11 val21 val31
Sheet2 0 val1 val2 val3
1 val11 val21 val31
Sheet3 0 val1 val2 val3
1 val11 val21 val31
Is there any way to create a new dataframe which has the same shape with data_df but each cell value is the cell position info?
I have tried to get multiple index:
multi_index = data_df.index.levels[:]
and I get:
[['Sheet1', 'Sheet2', 'Sheet3'], [0, 1]]
But I don't know how to use these data to create a new dataframe like this:
0 1 2
0 Sheet1 - A1 Sheet1 - B1 Sheet1 - C1
1 Sheet1 - A2 Sheet1 - B2 Sheet1 - C2
2 Sheet2 - A1 Sheet2 - B1 Sheet2 - C1
3 Sheet2 - A2 Sheet2 - B2 Sheet2 - C2
4 Sheet3 - A1 Sheet3 - B1 Sheet3 - C1
5 Sheet3 - A2 Sheet3 - B2 Sheet3 - C2
Thanks in advance!
Since the values in your data_df
don't matter, you could build the cartesian product of index and columns and build a new dataframe of it.
mapping = dict(enumerate('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
mapping is needed to convert the column numbers 0,1,2,... to A,B,C,...
UPDATE
In case you have excel sheets with more than 26 columns, you would need to do some more calculations to get the Excel column names like AA, AB,...AZ
and so on. In this post there are several answers which cover that issue.
import itertools
import numpy as np
out = (
pd.DataFrame(
np.array([f"{x[0][0]} - {mapping[int(x[1])]}{x[0][1]+1}"
for x in itertools.product(data_df.index, data_df.columns)])
.reshape(len(data_df), -1)
)
)
print(out)
0 1 2
0 Sheet1 - A1 Sheet1 - B1 Sheet1 - C1
1 Sheet1 - A2 Sheet1 - B2 Sheet1 - C2
2 Sheet2 - A1 Sheet2 - B1 Sheet2 - C1
3 Sheet2 - A2 Sheet2 - B2 Sheet2 - C2
4 Sheet3 - A1 Sheet3 - B1 Sheet3 - C1
5 Sheet3 - A2 Sheet3 - B2 Sheet3 - C2