Search code examples
azureapache-sparkdatabricksazure-databricks

.Net UDF for Apache Spark must be callable from Azure Databricks Notebook


I have a .Net Console Application which perform some operation on given inputs and provide outputs. Have written Spark Wrapper on that, and locally works fine. Facing issue to install this .NET publish packages and dependencies into an Azure Databricks Cluster (with this Notebook is attached).

using Microsoft.Spark.Sql;
using System;
namespace MySparkApp
{
   class Program
     {
        static void Main(string[] args)
          {
            // Create a Spark session
             SparkSession spark = SparkSession
               .Builder()
               .AppName("word_count_sample")
               .GetOrCreate();

             //Register UDFs
               Func<string,string> getName = GetName;
               spark.Udf().Register("UDF_GetName", getName);

            // Create initial DataFrame
           DataFrame dataFrame = spark.Read().Text("input.txt");
           // Count words
           DataFrame words = dataFrame
            .Select(Functions.Split(Functions.Col("value"), " ").Alias("words"))
            .Select(Functions.Explode(Functions.Col("words"))
            .Alias("word"))
            .GroupBy("word")
            .Count()
            .OrderBy(Functions.Col("count").Desc());

          // Show results
            words.Show();

        // Stop Spark session
          spark.Stop();
    }

    public static string GetName(string name)
    {
        return "Hello " + name;
    }
  }
}

Can you please guide me how to install my dependencies and then invoke UDFs from Notebook?

What I have done?

Any guidance will be appreciable. Thanks!


Solution

  • Same question I had posted to Azure Support Community https://learn.microsoft.com/en-us/answers/questions/1052175/net-udf-for-apache-spark-must-be-callable-from-azu.html

    From there I got answer:

    " Hi @NeerajKumarSingh-7721,

    I could see the steps of deploying your .NET app to databricks workspace. Kindly check this link.

    Did you tried it? Where exactly you stuck? If I am not wrong, you can publish your app to databricks and submit it as job to run. If you are looking to for calling some function which is inside your .NET app form Databricks notebook then thats not possible. Azure databaricks will not support C# languague. You need to consider using Synapse Analytics in that case. Because there we have native support for C# as well.

    Please let me know how it goes. "

    Hence conclusion is: We cannot call .NET UDFs through Azure Databricks Notebook! However other options are available example - Azure Synapse.

    Thanks MSFT to answer this!