azure sharepoint-online parquet azure-data-lake-gen2 excel-online

Read parquet file data from azure data lake to Excel stored in SharePoint Online

We have a following requirement to ingest data into an Excel file.

Query data from parquet file stored in azure data lake storage gen2
Ingest data into an Excel file
Store the Excel file in SharePoint Online

What would be the optimal approach to implement this? Would it be better to implement server side CSOM or other Azure tools?

Your help is much appreciated.

Solution

I probably would approach this like so (few components but the least amount of custom code)

Flow

Use logic app to copy parquet file to Azure Data Lake (or blob) Storage and trigger Data Factory pipeline
Data Factory Pipeline executes Databricks Python or Scala notebook (both can transform data save to Excel) as an output
Use Logic App to transfer Excel back to SharePoint

Because

Logic Apps are amazing for SharePoint operations task orchestration.
Data Factory nicely orchestrates databricks notebook execution.
Databricks can easily read Parquets and as easily can save data as Excel files so it seems like the perfect choice.

While there are 3 components you need to use in best case scenario it will take literally under 10 lines of code to do what you need to. Rest is just 'clicking'.

You could potentially skip data factory and call Databricks via REST api with logic apps, but why bother when there are out of the box components to do that.