Search code examples
azureazure-functionsazure-blob-storageazure-data-factoryazure-blob-trigger

How do I automate preprocessing of a complex text file with Python in Microsoft Azure?


I have a complex text file that can be processed into a pandas Dataframe by Python. What Azure services do you know of that I can deploy this script, to automatically trigger whenever a file is uploaded to a blob storage?

I know this is a loaded question, but I have tried things, like just using an Azure Function, using an Azure Function with an Azure Batch Job and using Azure Data Factory.

Is there any Azure service that could do this task in an straightforward way?


Solution

  • Complex text file that can be processed into a pandas Dataframe by Python

    You can use Azure Databricks to Preprocess complex text file.

    To automatically trigger whenever a file is uploaded to a blob storage

    Mount blob storage on Azure Databricks. Consider this sample notebook to mount blob storage.

    To automate preprocessing Use databricks notebook activity in Data Factory.

    In Data Factory you can use Event based trigger to run databricks notebook activity whenever new file is uploaded in Blob storage.