As per title, I would like to request a calculation to a Spark cluster (local/HDInsight in Azure) and get the results back from a C# application.
I acknowledged the existence of Livy which I understand is a REST API application sitting on top of Spark to query it, and I have not found a standard C# API package. Is this the right tool for the job? Is it just missing a well known C# API?
The Spark cluster needs to access Azure Cosmos DB, therefore I need to be able to submit a job including the connector jar library (or its path on the cluster driver) in order for Spark to read data from Cosmos.
As a .NET Spark connector to query data did not seem to exist I wrote one
https://github.com/UnoSD/SparkSharp
It is just a quick implementation, but it does have also a way of querying Cosmos DB using Spark SQL
It's just a C# client for Livy but it should be more than enough.
using (var client = new HdInsightClient("clusterName", "admin", "password"))
using (var session = await client.CreateSessionAsync(config))
{
var sum = await session.ExecuteStatementAsync<int>("val res = 1 + 1\nprintln(res)");
const string sql = "SELECT id, SUM(json.total) AS total FROM cosmos GROUP BY id";
var cosmos = await session.ExecuteCosmosDbSparkSqlQueryAsync<IEnumerable<Result>>
(
"cosmosName",
"cosmosKey",
"cosmosDatabase",
"cosmosCollection",
"cosmosPreferredRegions",
sql
);
}