I have a dataframe like as given below
ID,DIV,APP1,APP2,APP3,Col1,Col4
1,A,AB1,ABC1,ABCD1,20,40
2,A,AB1,ABC2,ABCD2,60,
3,B,BC1,BCD1,BCDE1,10,20
region_1 = pd.read_clipboard(sep=',')
region_1.set_index(['ID','DIV','APP1','APP2','APP3'],inplace=True)
And I have an excel file like as below. You can download them from here
I would like to enter the data into the excel file if their indices and column names match with my dataframe indices and column names. For ex: ID=1
and Col1
(column name) matches between both excel and dataframe columns. So, we type in/enter the value 20 in to the corresponding cell.
I was trying something like below
all_rows = sales.sheets[0]['B5:B8'].value
region_1_cols = sales.sheets[0]['G3:J3'].value
sales.sheets[0]['G5'].value = region_1.reindex(columns=cols,index=rows)
But this doesn't work.
Based on SO suggestion, I tried the below
ws = sales.sheets[0]
ws.range('G5').options(index=False).value = region_1.astype(str).reset_index(drop=True)
But it types in the value in excel without any column name check or ID value check
Is there any iterative approach to do this using for loop etc. We need to retain the format of the excel as it is. So, we type in the value one by one based on matching criteria.
I expect my output to be like as shown below
DataFrame.fillna
can accept an entire DataFrame of filler values, in which case it fills missing values by matching the row/column labels.
So, create an empty DataFrame based on the Excel table and fill it with region_1
. This avoids looping entirely, so it should scale much better.
Extract the Excel table's row index and column names:
sales_index = sales.sheets[0].range((5, 2), (8, 6)).value
# [[1, 'A', 'AB1', 'ABC1', 'ABCD1'],
# [2, 'A', 'AB1', 'ABC2', 'ABCD2'],
# [3, 'B', 'BC1', 'BCD1', 'BCDE1'],
# [4, 'B', 'BC1', 'BCD2', 'BCDE2']]
sales_columns = sales.sheets[0].range('G3:J3').value
# ['Col1', 'Col2', 'Col3', 'Col4']
Note: The exact xlwings
commands might need to be tweaked (I don't have access to xlwings
on Linux), but the idea is just to extract the row index (nested list) and column names (list).
Create an empty sales_df
from the extracted row index and column names:
sales_df = (pd.DataFrame(sales_index, columns=region_1.index.names)
.set_index(region_1.index.names)
.reindex(columns=sales_columns))
# Col1 Col2 Col3 Col4
# ID DIV APP1 APP2 APP3
# 1 A AB1 ABC1 ABCD1 NaN NaN NaN NaN
# 2 A AB1 ABC2 ABCD2 NaN NaN NaN NaN
# 3 B BC1 BCD1 BCDE1 NaN NaN NaN NaN
# 4 B BC1 BCD2 BCDE2 NaN NaN NaN NaN
Fill sales_df
with region_1
:
sales_df = sales_df.fillna(region_1)
# Col1 Col2 Col3 Col4
# ID DIV APP1 APP2 APP3
# 1 A AB1 ABC1 ABCD1 20 NaN NaN 40
# 2 A AB1 ABC2 ABCD2 60 NaN NaN NaN
# 3 B BC1 BCD1 BCDE1 10 NaN NaN 20
# 4 B BC1 BCD2 BCDE2 NaN NaN NaN NaN
Enter the filled sales_df
back to the Excel sheet:
sales.sheet[0].range('G5').value = sales_df.values.tolist()