Search code examples
apache-sparkapache-spark-sqllooker

Oracle INSTR equivalent in Spark SQL


I tried to replicate the oracle Instr function, but it seems to me that there are not all the arguments that exist in Oracle. I receive this error and I would like to include this transformation in a "plataforma" field in the table but I can't:

SELECT
SUBSTR(a.SOURCE, 0, INSTR(a.SOURCE, '-', 1, 2) - 1) AS plataforma,
COUNT(*) AS qtd
FROM db1.table AS as a
LEFT JOIN db1.table2 AS b ON a.ID=b.id
GROUP BY SUBSTR(a.SOURCE, 0, INSTR(a.SOURCE, '-', 1, 2) - 1)
ORDER BY qtd

The Apache Spark 2.0 database encountered an error while running this query. Error running query: org.apache.spark.sql.AnalysisException: Invalid number of arguments for function instr. Expected: 2; Found: 4; line 8 pos 45

I made the transformation of the field that way but I don't know if it is the correct one:

enter image description here

How can I replicate the same Oracle function in Spark? I need to do just this:

Source:

apache-spark-sql
sql-server-dw

Result:

apache-spark
sql-server

Solution

  • What you're looking for is substring_index function :

    substring_index('apache-spark-sql', '-', 2)
    

    It returns the substring before 2 occurrences of -.

    I suppose you want to get the substring before the last occurrence of -. So you can count the number of - in the input string and combine it with substring_index function like this:

    substring_index(col, '-', size(split(col, '-')) - 1)
    

    Where size(split(col, '-')) - 1 gives the number of occurences of -.