We have a following requirement to ingest data into an Excel file.
- Query data from parquet file stored in azure data lake storage gen2
- Ingest data into an Excel file
- Store the Excel file in SharePoint Online
What would be the optimal approach to implement this? Would it be better to implement server side CSOM or other Azure tools?
Your help is much appreciated.
I probably would approach this like so (few components but the least amount of custom code)
Flow
- Use logic app to copy parquet file to Azure Data Lake (or blob) Storage and trigger Data Factory pipeline
- Data Factory Pipeline executes Databricks Python or Scala notebook (both can transform data save to Excel) as an output
- Use Logic App to transfer Excel back to SharePoint
Because
- Logic Apps are amazing for SharePoint operations task orchestration.
- Data Factory nicely orchestrates databricks notebook execution.
- Databricks can easily read Parquets and as easily can save data as Excel files so it seems like the perfect choice.
While there are 3 components you need to use in best case scenario it will take literally under 10 lines of code to do what you need to. Rest is just 'clicking'.
You could potentially skip data factory and call Databricks via REST api with logic apps, but why bother when there are out of the box components to do that.