Search code examples
pythonexcelpandasdataframexlwings

Pandas - compare index and column between excel and dataframe to enter value


I have a dataframe like as given below

ID,DIV,APP1,APP2,APP3,Col1,Col4
1,A,AB1,ABC1,ABCD1,20,40
2,A,AB1,ABC2,ABCD2,60,
3,B,BC1,BCD1,BCDE1,10,20

region_1 = pd.read_clipboard(sep=',') 
region_1.set_index(['ID','DIV','APP1','APP2','APP3'],inplace=True)

And I have an excel file like as below. You can download them from here

enter image description here

I would like to enter the data into the excel file if their indices and column names match with my dataframe indices and column names. For ex: ID=1 and Col1 (column name) matches between both excel and dataframe columns. So, we type in/enter the value 20 in to the corresponding cell.

I was trying something like below

all_rows = sales.sheets[0]['B5:B8'].value
region_1_cols = sales.sheets[0]['G3:J3'].value
sales.sheets[0]['G5'].value = region_1.reindex(columns=cols,index=rows)

But this doesn't work.

Based on SO suggestion, I tried the below

ws = sales.sheets[0] 
ws.range('G5').options(index=False).value = region_1.astype(str).reset_index(drop=True)

But it types in the value in excel without any column name check or ID value check

enter image description here

Is there any iterative approach to do this using for loop etc. We need to retain the format of the excel as it is. So, we type in the value one by one based on matching criteria.

I expect my output to be like as shown below

enter image description here


Solution

  • DataFrame.fillna can accept an entire DataFrame of filler values, in which case it fills missing values by matching the row/column labels.

    So, create an empty DataFrame based on the Excel table and fill it with region_1. This avoids looping entirely, so it should scale much better.

    1. Extract the Excel table's row index and column names:

      sales_index = sales.sheets[0].range((5, 2), (8, 6)).value
      # [[1, 'A', 'AB1', 'ABC1', 'ABCD1'],
      #  [2, 'A', 'AB1', 'ABC2', 'ABCD2'],
      #  [3, 'B', 'BC1', 'BCD1', 'BCDE1'],
      #  [4, 'B', 'BC1', 'BCD2', 'BCDE2']]
      
      sales_columns = sales.sheets[0].range('G3:J3').value
      # ['Col1', 'Col2', 'Col3', 'Col4']
      

      Note: The exact xlwings commands might need to be tweaked (I don't have access to xlwings on Linux), but the idea is just to extract the row index (nested list) and column names (list).

    2. Create an empty sales_df from the extracted row index and column names:

      sales_df = (pd.DataFrame(sales_index, columns=region_1.index.names)
                    .set_index(region_1.index.names)
                    .reindex(columns=sales_columns))
      #                         Col1  Col2  Col3  Col4
      # ID DIV APP1 APP2 APP3                         
      # 1  A   AB1  ABC1 ABCD1   NaN   NaN   NaN   NaN
      # 2  A   AB1  ABC2 ABCD2   NaN   NaN   NaN   NaN
      # 3  B   BC1  BCD1 BCDE1   NaN   NaN   NaN   NaN
      # 4  B   BC1  BCD2 BCDE2   NaN   NaN   NaN   NaN
      
    3. Fill sales_df with region_1:

      sales_df = sales_df.fillna(region_1)
      #                         Col1  Col2  Col3  Col4
      # ID DIV APP1 APP2 APP3                         
      # 1  A   AB1  ABC1 ABCD1    20   NaN   NaN    40
      # 2  A   AB1  ABC2 ABCD2    60   NaN   NaN   NaN
      # 3  B   BC1  BCD1 BCDE1    10   NaN   NaN    20
      # 4  B   BC1  BCD2 BCDE2   NaN   NaN   NaN   NaN
      
    4. Enter the filled sales_df back to the Excel sheet:

      sales.sheet[0].range('G5').value = sales_df.values.tolist()