Search code examples
scalaapache-sparkapache-spark-sql

Converting Sql query to spark


I have sql query which I want to convert to spark-scala

SELECT aid,DId,BM,BY 
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1;

SU is my Data Frame. I did this by

sqlContext.sql("""
  SELECT aid,DId,BM,BY 
  FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
  GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1
""")

Instead of that I need this in utilizing my dataframe


Solution

  • This should be the DataFrame equivalent:

    SU.filter($"cd" === 2)
      .select("aid","DId","BM","BY","TO")
      .distinct()
      .groupBy("aid","DId","BM","BY")
      .count()
      .filter($"count" > 1)
      .select("aid","DId","BM","BY")