When invoking Azure ML Batch Endpoints (creating jobs for inferencing), the run() method should return a pandas DataFrame or an array as explained here
However this example shown, doesn't represent an output with headers for a csv, as it is often needed.
The first thing I've tried was to return the data as a pandas DataFrame and the result is just a simple csv with a single column and without the headers.
When trying to pass the values with several columns and it's corresponding headers, to be later saved as csv, as a result, I'm getting awkward square brackets (representing the lists in python) and the apostrophes (representing strings)
I haven't been able to find documentation elsewhere, to fix this:
This is the way I found to create a clean output in csv format using python, from a batch endpoint invoke in AzureML:
def run(mini_batch):
batch = []
for file_path in mini_batch:
df = pd.read_csv(file_path)
# Do any data quality verification here:
if 'id' not in df.columns:
logger.error("ERROR: CSV file uploaded without id column")
return None
else:
df['id'] = df['id'].astype(str)
# Now we need to create the predictions, with previously loaded model in init():
df['prediction'] = model.predict(df)
# or alternative, df[MULTILABEL_LIST] = model.predict(df)
batch.append(df)
batch_df = pd.concat(batch)
# After joining all data, we create the columns headers as a string,
# here we remove the square brackets and apostrophes:
azureml_columns = str(batch_df.columns.tolist())[1:-1].replace('\'','')
result = []
result.append(azureml_columns)
# Now we have to parse all values as strings, row by row,
# adding a comma between each value
for row in batch_df.iterrows():
azureml_row = str(row[1].values).replace(' ', ',')[1:-1].replace('\'','').replace('\n','')
result.append(azureml_row)
logger.info("Finished Run")
return result