Search code examples
sqlstored-proceduresapache-sparkspark-graphx

Combine SpqrkSQL and GraphX


Can you create a stored procedure in SparkSQL and call GraphX API? something like this:

registerFunction("storedProcedureGraphX", model.storedProcedureGraphX _)

select * from someTable where storedProcedureGraphX(nodeX, nodeY) > 10


Solution

  • If by GraphX API you mean any operation on RDD then the answer is no. That would involve launching new Spark task for each row which definitely isn't a good idea. And you'd have to close on the SparkContext, which is not serializable - and functions that you use as UDFs have to be.