Search code examples
scalaapache-spark-sqlcomparison

Why doesn't a '==' comparison on a column work in Spark SQL?


I have a simple spark statement but it seems to return false contrary to expected result of true:

spark.sql("SELECT 1 AS a").withColumn("b", lit($"a" == 1)).show
+---+-----+
|  a|    b|
+---+-----+
|  1|false|
+---+-----+

I've tried $"a" == lit(1) and $"a".equals(1) etc. but all return false.
A statement of $"a" >= 1 returns true so why not $"a" == 1?


Solution

  • Scala has defined === operator that works as a type-safe equals operator, very similar to the operator in javascript. Spark framework defines the equalTo method in Column class. equalTo returns a new Column object that has the result of comparing two column values. The method equalTo is used by === operator to compare column values. Operator == uses the equals method that checks if both the objects being compared are referencing to the same object. Have a look at the spark API docs for these methods in column class:

    https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/Column.html#equalTo-java.lang.Object- https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/Column.html#equals-java.lang.Object-