Search code examples
apache-sparkcassandraapache-spark-sqlcqlcqlsh

org.apache.spark.sql.AnalysisException: Undefined function: 'ano'


I get this error in the spark 3.0.0:

ERRO1:

org.apache.spark.sql.AnalysisException: Undefined function: 'ano'. This function is neither a registered temporary function nor a permanent function registered in the database 'sspkeyspace'.; line 1 pos 58

I created a UDF in Cassandra DB 3.11.9 to extract the year of the date column:

CREATE OR REPLACE FUNCTION ano (input DATE)
RETURNS NULL ON NULL INPUT RETURNS TEXT
LANGUAGE java AS 'return input.toString().substring(0,4);';

I ran the query in the cqlsh prompt :

select  ano(data_compra) as ano from Compras ;

and it's worked well. However in the spark, into the application, shows the ERRO1.

Result Query in the cql prompt:

ano
-----
2014
2009
2013
2012
2014
2012
2011
2019

Thanks,


Solution

  • It will not work this way - when you execute select ano(data_compra) as ano from Compras in Spark, it's considering the ano function as Spark function, not as the Cassandra UDF function.

    Unfortunately, to expose UDF to Spark SQL, you will need to write some code for Spark Cassandra Connector itself. Instead, it's better to re-implement the needed functionality in Spark itself, just replace usage of ano call with call to built-in Spark substring - it could be even more performant.