Search code examples
pythonpandasoptimizationcode-cleanup

how do i create a template for functions in python?


I have some functions in python that share the same structure:

  • Load data from a path
  • do some processing with pandas
  • save results in a csv file

A couple examples:


def generate_report_1(eval_path, output_path):
   df = pd.read_csv(eval_path)
   missclassified_samples = df[df["miss"] == True]
   missclassified_samples.to_csv(output_path)


def generate_report_2(eval_path, output_path):
   df = pd.read_csv(eval_path)
   
   dict_df = df.to_dict()
   
   final_results = {}
   for name, metric in dict_df.items():
      # ... do some processing

   pd.DataFrame(final_results).to_csv(output_path)
   

In ruby, we can use blocks to pause and return to the execution of a function using yield. I would like to know a good practice to accomplish this in python, since this is a case of undesired repeated code.

Thanks.


Solution

  • No special construct is needed, just plain Python functions.
    The only trick is passing a processing function as a parameter to your report function, thusly:

    def generate_report(eval_path, processfunc, output_path):
       df = pd.read_csv(eval_path)
       result = processfunc(df)
       result.to_csv(output_path)
    
    def process_1(df):
       return df[df["miss"] == True]
    
    def process_2(df):
       dict_df = df.to_dict()
       final_results = {}
       for name, metric in dict_df.items():
          # ... do some processing
       return pd.DataFrame(final_results)
    
    # and then:  
    # generate_report(my_eval_path, process_1, my_output_path)