Search code examples
c#.netapache-sparkdatabricksazure-databricks

Is there a way to query Databricks DBFS or parquets via .NET for Apache Spark?


In a nutshell, I'm trying to explore the possibility of serving data from a Databricks workspace to a C#/.NET application for user interaction and ad-hoc queries. I've spent some time setting up Databricks-Connect, which seems to be working insofar that I can run Python Spark jobs on the Databricks cluster from my local machine.

I'm also trying to walk through setup and execute samples from .NET for Apache Spark

My problem is that I'm having trouble finding documentation, samples, or demos of anything involving these two working together. Is it possible to set up a spark session in .NET that would allow for spark sql against data in the databricks cluster? Is Databricks-connect the appropriate avenue for this?

Above all, is it possible to deploy .NET application to an Azure App Service that could serve the Databricks data?


Solution

  • The standard way for doing that is really to use JDBC or ODBC connected to the existing cluster or SQL Analytics - it should be supported in .Net framework via ADO.Net. It would be easier to use than going down the route of the setting up the databricks-connect, and it will be potentially cheaper if you use SQL Analytics vs interactive clusters that are used for databricks-connect.