Search code examples
pandasdataframeopenpyxl

How to get openpyxl to work with a styler formatted dataframe


Im comparing two HR related excel sheets. I'm comparing a previous weeks info to a current weeks info to ensure people were promoted as they should be.

Overall it produces an excel sheet that combines both dataframes and highlights the differences of any employee with a promotion or step increase. This works, but I have to go into the excel sheet after and manually make formatting changes before I can provide a "final copy" for others to use.

My issue is there is a function that highlights differences in pay settings between the two dataframes and highlights them. Prior to exporting to_excel, I am trying to make changes to that excel file using some type of formatting. This produces an error because the color_changes = merger_changes.style.apply(highlight, axis = None) produces a Styler, which openpyxl wont work with because its a styler.

I tried different products to get the changed columns to highlight and then create a final excel product using openpyxl that will produce an excel sheet that I dont have to edit after. I keep getting errors because openpyxl wont work with a styler. Am I coding this out of order? Will another product work with a styler rather than openpyxl or am I just missing a block of code to make this all work together?

I want it to not only highlight the differences in the "new step" and "new grade" but then export to an excel sheet that I can adjust the format (column widths and date format) prior to exporting to_excel so the excel sheet that it exports is a final product.

import pandas as pd  
import datetime as dt

#new_WFRL  

new_WFRL = pd.read_excel("H:/DIR/Human Resources/HR Audits/Raw Files/WFRL 6.24.24.xlsx", usecols=["Empl ID", "Name", "Grade", "Step", "Grade Entry Date", "Step Date", "WGI Due Dt"], parse_dates=["Grade Entry Date", "Step Date", "WGI Due Dt"],  date_format="%m/%d/%y", index_col=False).sort_values("Name")   
new_WFRL = new_WFRL.add_prefix("New ")  
new_WFRL[['New Grade Entry Date', 'New Step Date', 'New WGI Due Dt']] = new_WFRL[['New Grade Entry Date', 'New Step Date', 'New WGI Due Dt']].apply(pd.to_datetime, format='mixed')

#old_WFRL  
old_WFRL = pd.read_excel("H:/DIR/Human Resources/HR Audits/Raw Files/WFRL 6.14.24.xlsx", usecols=["Empl ID", "Name", "Grade", "Step", "Grade Entry Date", "Step Date", "WGI Due Dt"], parse_dates=["Grade Entry Date", "Step Date", "WGI Due Dt"], date_format="%m/%d/%y", index_col=False).sort_values("Name")   
old_WFRL = old_WFRL.add_prefix("Old ")  
old_WFRL[['Old Grade Entry Date', 'Old Step Date', 'Old WGI Due Dt']] = old_WFRL[['Old Grade Entry Date', 'Old Step Date', 'Old WGI Due Dt']].apply(pd.to_datetime, format='mixed')
  
merged_WFRL = pd.merge(old_WFRL, new_WFRL, how = "outer", left_on = "Old Empl ID", right_on = "New Empl ID")  
merged_WFRL 
#this is the function that will show a change  

def compare_WFRL(df):  
    if df["Old Grade"] == df["New Grade"] and df["Old Step"]== df["New Step"]:  
        return 1  
    else:  
        return 0  

#applies the above function  
merged_WFRL["changes"] = merged_WFRL.apply(compare_WFRL, axis =1)  

  

#filters all the rows so changes are the only ones left  
merged_changes = merged_WFRL[merged_WFRL["changes"] == 0]  
merged_changes 

def highlight(df):  
    grade_mismatch = df['Old Grade'] != df['New Grade']  
    step_mismatch = df['Old Step'] != df['New Step']  

     
    color = df.copy()  
    color.loc[:, :] = ''  
    color.loc[grade_mismatch, 'New Grade'] = 'background-color: yellow'  
    color.loc[step_mismatch, 'New Step'] = 'background-color: red'  

    return color  

merged_changes.pop('changes')  


#applies the correct style as designed above in "highlight"
#this creates the dataframe into a styler - openpyxl wont work with stylers
color_changes = merged_changes.style.apply(highlight, axis = None) 

#index=False drops the index column when exporting to excel. 

color_changes.to_excel("H:/DIR/Human Resources/HR Audits/WFRL Comparison Reports/6.24.24v6.xlsx", index=False)

Solution

  • I would suggest to save the dataframe first using the pandas to_excel() function.

    Then using the openpyxl library to format the excel.

    import pandas as pd  
    import datetime as dt
    
    #new_WFRL
    new_WFRL = pd.read_excel("H:/DIR/Human Resources/HR Audits/Raw Files/WFRL 6.24.24.xlsx", usecols=["Empl ID", "Name", "Grade", "Step", "Grade Entry Date", "Step Date", "WGI Due Dt"], parse_dates=["Grade Entry Date", "Step Date", "WGI Due Dt"],  date_format="%m/%d/%y", index_col=False).sort_values("Name")   
    new_WFRL = new_WFRL.add_prefix("New ")  
    new_WFRL[['New Grade Entry Date', 'New Step Date', 'New WGI Due Dt']] = new_WFRL[['New Grade Entry Date', 'New Step Date', 'New WGI Due Dt']].apply(pd.to_datetime, format='mixed')
    
    #old_WFRL  
    old_WFRL = pd.read_excel("H:/DIR/Human Resources/HR Audits/Raw Files/WFRL 6.14.24.xlsx", usecols=["Empl ID", "Name", "Grade", "Step", "Grade Entry Date", "Step Date", "WGI Due Dt"], parse_dates=["Grade Entry Date", "Step Date", "WGI Due Dt"], date_format="%m/%d/%y", index_col=False).sort_values("Name")   
    old_WFRL = old_WFRL.add_prefix("Old ")  
    old_WFRL[['Old Grade Entry Date', 'Old Step Date', 'Old WGI Due Dt']] = old_WFRL[['Old Grade Entry Date', 'Old Step Date', 'Old WGI Due Dt']].apply(pd.to_datetime, format='mixed')
      
    merged_WFRL = pd.merge(old_WFRL, new_WFRL, how = "outer", left_on = "Old Empl ID", right_on = "New Empl ID")  
    merged_WFRL 
    #this is the function that will show a change  
    
    def compare_WFRL(df):  
        if df["Old Grade"] == df["New Grade"] and df["Old Step"]== df["New Step"]:  
            return 1  
        else:  
            return 0  
    
    #applies the above function  
    merged_WFRL["changes"] = merged_WFRL.apply(compare_WFRL, axis =1)  
    
    #filters all the rows so changes are the only ones left  
    merged_changes = merged_WFRL[merged_WFRL["changes"] == 0]
    
    # Export DataFrame to Excel
    export_file = "H:/DIR/Human Resources/HR Audits/WFRL Comparison Reports/6.24.24v6.xlsx"
    merged_changes.to_excel(export_file, index=False)
    
    ######################################################################################################
    from openpyxl import load_workbook
    from openpyxl.styles import PatternFill
    
    # Load the workbook and select the active worksheet
    wb = load_workbook(export_file)
    ws = wb.active
    
    # Define cell styles
    yellow_fill = PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")
    red_fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")
    
    # Apply conditional formatting
    for row in range(2, len(merged_changes) + 2):
        old_grade_cell = ws.cell(row=row, column=1)  # Old Grade
        new_grade_cell = ws.cell(row=row, column=2)  # New Grade
        old_step_cell = ws.cell(row=row, column=3)  # Old Step
        new_step_cell = ws.cell(row=row, column=4)  # New Step
        
        # Check for New Grade not equal to Old Grade
        if new_grade_cell.value != old_grade_cell.value:
            new_grade_cell.fill = yellow_fill
        
        # Check for New Step not equal to Old Step
        if new_step_cell.value != old_step_cell.value:
            new_step_cell.fill = red_fill
    
    # Save the workbook
    wb.save(export_file)
    

    I have assumed only 4 columns to be present in the merged_changes dataframe in the sequence of 'Old Grade', 'New Grade', 'Old Step', 'New Step'