Search code examples
pythonexcelxlsxxlsxwriter

Excel Columns Comparison in Generic way and highlight the differences with some color


I am trying to convert a csv file to excel as per below code and i have few columns which need to be compared inside excel and do conditional format

In the below code i have hard coded cells "B2<>C2" , "C2:C1048576". Without providing B2,C2 can i make those cells name to read in generic way and compare the complete data in excel

Wherever i have _SRC & _TGT those columns need to be compared simultaneously. Below link has both the sample excels

for csvfile in glob.glob(os.path.join('.', "file1.csv")):
    workbook = Workbook(csvfile[:-4] + '.xlsx')
    worksheet = workbook.add_worksheet()
    with open(csvfile, 'rt', encoding='utf8') as f:
        reader = csv.reader(f)
        for r, row in enumerate(reader):
            for c, col in enumerate(row):
                worksheet.write(r, c, col)
    orange_format = workbook.add_format({'bg_color':   '#FFEB9C',
                                     'font_color': '#9C6500'})

    worksheet.conditional_format('C2:C1048576', {'type':'formula',
                                       'criteria':'=B2<>C2',
                                       'format':orange_format})

    workbook.close()

I have two types of excels data as below

enter image description here

enter image description here


Solution

  • If you're asking how to make

    worksheet.conditional_format('C2:C1048576', {'type':'formula',
                                       'criteria':'=B2<>C2',
                                       'format':orange_format})
    

    reusable to other columns, then try something like this:

    def get_previous_column(column):
        # if you're expecting to go above column `Z` then you'll need to improve this function
        return chr(ord(column) - 1)
    
    def make_conditional_format(worksheet, col_src, col_tgt):
        column_range = f"{col_tgt}2:{col_tgt}1048576"
        worksheet.conditional_format(
            column_range, 
            {
                'type':'formula',
                'criteria':f'={col_src}2<>{col_tgt}2',
                'format':orange_format
             }
        )
    
    columns_to_format = ['C', 'E', 'K', 'M']
    
    for col_tgt in columns_to_format:
        col_src = get_previous_column(col_tgt)
        make_conditional_format(worksheet, col_src, col_tgt)