Search code examples
mysqlsqlpysparkapache-spark-sql

Multiple condition on same column in sql or in pyspark


I have a dataset below:

|acc_no|acc_type_id|bal_amt|bal_type_id|    curr_code|        forex|  
|   123|          1| 123.45|          1|          USD|          1.0|  
|   123|          1|  124.0|          2|          USD|          1.0|  
|   123|          1| 200.56|          3|          USD|          1.0|  
|   124|          2|34500.0|          1|          INR|        0.014|  
|   124|          2|42000.0|          3|          INR|        0.014|  
|   125|          1|  470.0|          2|          USD|          1.0|  
|   125|          1|  470.0|          3|          USD|          1.0|  
|   126|          1|    0.0|          1|          USD|          1.0|  
|   126|          2|  370.0|          3|          USD|          1.0|  

I need to select rows where bal_type_id = 3 only when bal_amt > 0 for bal_type_id = 1.

Output Dataset :

|acc_no|acc_type_id|bal_amt|bal_type_id|    curr_code|        forex|  
|   124|          2|42000.0|          3|          INR|        0.014|  
|   123|          1| 200.56|          3|          USD|          1.0|  

How can I do this?. Please help.


Solution

  • You can use window functions:

    select t.*
    from (select t.*,
                 max(case when bal_type_id = 1 then bal_amt end) over (partition by acc_no) as bal_amt_1
          from t
         ) t
    where bal_amt_1 > 0 and bal_type_id = 3;
    

    Or, you can use exists:

    select t.*
    from t
    where t.bal_type_id = 3 and
          exists (select 1
                  from t t2
                  where t2.acc_no = t.acc_no and
                        t2.bal_type_id = 1 and
                        t2.amt > 0
                 );