Search code examples
sqlapache-spark-sqlcasecase-statement

case statement in Spark SQL


I am working on a workflow for my company. Therefore I need to use a Spark SQL case-statement to filter something.

I have a column called OPP_amount_euro (the amount of money used for something is saved there) and I have a column called OPP_amount_euro_binned (default value is 1). So I want to program some kind of interval. If the value in OPP_amount_euro is < 30000 the value in OPP_amount_euro_binned should be 1, and so on.

I already tried to find a solution, but it´s not the best one.

select
case when OPP_amount_eur < 30000 then 1
when OPP_amount_eur >= 30000 then 2
when OPP_amount_eur >= 50000 then 3
when OPP_amount_eur >= 100000 then 4
when OPP_amount_eur >= 300000 then 5
when OPP_amount_eur >= 500000 then 6
when OPP_amount_eur >= 1000000 then 7
end as OPP_amount_eur_binned
from inputTable

so this code is working well but I can´t select any further columns in the table. If I write a '*' after the select I will get following error message:

Exception in processing: ParseException: mismatched input 'when' expecting {, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 2, pos 5) == SQL == Select * case when OPP_amount_eur < 30000 then 1 -----^^^ when OPP_amount_eur >= 30000 then 2 when OPP_amount_eur >= 50000 then 3 when OPP_amount_eur >= 100000 then 4 when OPP_amount_eur >= 300000 then 5 when OPP_amount_eur >= 500000 then 6 when OPP_amount_eur >= 1000000 then 7 end as OPP_amount_eur_binned from temptable3083b308bcec4124b6a4650f2bb40695

Why I can´t do this? I searched on the internet for it and in normal SQL it seems to work, why this is not possible in Spark SQL? Is there any solution?

I am sorry for my bad description, but I am absolutely new here and also I never had contact with Spark SQL. I am in my traineeship as a student.


Solution

  • This is the solution for my problem

     Select inputTable.*,
    
    case 
         when OPP_amount_eur between 0 and 30000 then 1
         when OPP_amount_eur between 30000 and 50000 then 2
         when OPP_amount_eur between 50000 and 100000 then 3
         when OPP_amount_eur between 100000 and 300000 then 4
         when OPP_amount_eur between 300000 and 500000 then 5
         when OPP_amount_eur between 500000 and 1000000 then 6
         else '7'
    
         end as OPP_amount_eur_binned
    
    from inputTable