Forum,
I am currently looking into Azure Synapse as an option for migrating our on-prem data architecture. I am excited by the functionality it offers - SQL Pools, Spark Pools, and the accompanying notebooks. I get that Synapse can function as a all in one data platform, where my data scientists and data analists can use its functionality to deliver insights at will. However, a large part of the work my team does is creating data products.
We currently have a kubernetes cluster with several stand-alone API's that perform data-science operations in the larger whole of our software. They can be thought of as microservices. Most of the ETL is done in our SQL-server, and the microservices in our K8S cluster (usually python + some python packages + FastAPI) typically get the required data from our SQL-server through some SQL-query with an ODBC connector.
Now my question is, how suitable is Synapse for such an architecture? Can I call upon the SQL-pool or spark-pool to do the heavy data-lifting from outside the azure environment, say from a kubernetes pod?
Unfortunately you can't integrate Azure Synapse Analytics with Kubernetes Services.
While Synapse SQL helps perform SQL queries, Apache Spark executes batch/stream processing on Big Data. SQL Pool is used to work with data stored in Dedicated SQL Pool while Spark SQL can be integrated with existing data preparation or data science projects that you may hold in Azure Databricks or Azure Machine Learning Services.
Also, as per this third-party document, Azure Synapse Analytics can't integrate with Kubernetes Services.
As a workaround, you can copy/move your data from Kubernetes to Azure Services like Azure Dedicated SQL Pool, Azure Blob Storage or Azure Data Lake Storage and then integrate it with Azure Synapse pipeline or Spark Pool.