Search code examples
python-3.xfunctionpandaspython-modulexlsxwriter

Variables assignement before function


I have created a package to quickly transform datas using pandas and xlsxwriter. This worked pretty well and I did a few functions successfully. But recently I've hit a wall:

For a few functions I need to define variables first but they are not basic types (list, tuple, str etc.) but for instance a dataframe. I've looked into global variables and saw they're are not recommanded (and wouldn't know where to put them) and I also looked into classes but I don't know how to solve my problem using them. I've also tried creating an empty dataframe but got an empty dataframe after the function.

What I'm trying to do is a read function with pandas for .csv or .xlsx and a function for saving with Xlsxwriter engine. The goal is to change as little as possible in the code to transform data frequently and rapidly (e.g. i have functions doing LEFT,RIGHT like in Excel or even MIDDLE with column numbers) and have an easy and short code in main.py.

Here is the stripped down version of my code which uses 2 python files (main.py and format_operations.py). I have added commentaries where I'm having issues. Thanks in advance for your help!

    """ 
    main.py
    """
    import format_operations as tbfrm #import another python file in the same folder
    import pandas as pd
    import numpy as np
    import xlsxwriter.utility

    #file settings
    file_full_path= "C:/Tests/big_data.xlsx"
    file_save_to= "C:/Tests/Xlsxwriter.xlsx"
    sheet_name_save_to= "Xlswriter"
    dfname = ??? #I need to create the variable but I don't know how
    tbfrm.FCT_universal_read(dfname,file_full_path) #CAN'T GET IT TO WORK

    #column operations and formatting
    columns_numeric = [3,6] # (with pandas) list of columns with number values by iloc number, starts at 0 which is column A in Excel
    tbfrm.FCT_columns_numeric(dfname,columns_numeric) #example of a WORKING function (if dfname is defined)

    #write with Xlsxwriter engine
    XLWRITER_DF = ??? #same problem as before, how to create the variable?
    workbookvarname = ??? #same here
    worksheetvarname = ??? # same here
    tbfrm.FCT_df_xlsxwriter(XLWRITER_DF,dfname,file_save_to,sheet_name_save_to,workbookvarname,worksheetvarname) #CAN'T GET IT TO WORK

    #### WORKING piece of code I want to execute after saving with Xlsxwriter engine ####
    worksheet.set_zoom(80)

    # Conditional formatting
    color_range_1 = "J1:J{}".format(number_rows+1)
    FORMAT1 = workbook.add_format({'bg_color': '#FFC7CE','font_color': '#9C0006'})
    FORMAT2 = workbook.add_format({'bg_color': '#C6EFCE','font_color': '#006100'})
    worksheet.conditional_format(color_range_1, {'type': 'bottom','value': '5','format': FORMAT1})
    worksheet.conditional_format(color_range_1, {'type': 'top','value': '5','format': FORMAT2})

Other file:

    """ 
    format_operations.py
    """
    import pandas as pd
    import numpy as np
    import xlsxwriter.utility

    def FCT_universal_read(dfname,file_full_path):
        if ".xls" in file_full_path:
            dfname = pd.read_excel(file_full_path) #optional arguments:sheetname='Sheet1', header=0 , dtype=object to preserve values
        if ".csv" in file_full_path:
            dfname = pd.read_csv(file_full_path)

    # save file with XLSXWriter engine for additional options to pandas
    def FCT_df_xlsxwriter(XLWRITER_DF,dfname,file_save_to,sheet_name_save_to,workbookvarname,worksheetvarname):
        XLWRITER_DF = pd.ExcelWriter(file_save_to, engine='xlsxwriter')
        dfname.to_excel(XLWRITER_DF, sheet_name=sheet_name_save_to,encoding='utf-8')
        workbookvarname = XLWRITER_DF.book
        worksheetvarname = XLWRITER_DF.sheets[sheet_name_save_to]

    #format as numbers
    def FCT_columns_numeric(dfname,columns_numeric):
        for x in columns_numeric:
            dfname.iloc[:,x] = pd.to_numeric(dfname.iloc[:,x])

Solution

  • Your FCT_universal_read function should not modify a dataframe but instead return a new one:

    def FCT_universal_read(file_full_path):
        if file_full_path.split('.')[-1] == "xls":
            df = pd.read_excel(file_full_path) #optional arguments:sheetname='Sheet1', header=0 , dtype=object to preserve values
        if file_full_path.split('.')[-1] == "csv":
            df = pd.read_csv(file_full_path)
        return df
    

    And in your main, do:

    dfname = tbfrm.FCT_universal_read(file_full_path)
    

    Same answer for FCT_df_xlsxwriter, you should rewrite it with a return so that you can do:

        XLWRITER_DF, workbookvarname,worksheetvarname =  tbfrm.FCT_df_xlsxwriter(dfname,file_save_to,sheet_name_save_to) 
    

    To grasp how python is dealing with the arguments you pass to a function, you should read these blog posts:

    https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/ https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/