Search code examples
scalaapache-spark

spark scala extraneous input '>' ParseException in transform


I'm trying to execute the query bellow:

finvInventoryAllDf
      .groupBy("Site_siteId")
      .agg(
        collect_set(
          array(
            "InstalledOffer_applicationSource",
            "InstalledOffer_standardStatus", "InstalledOffer_installedOfferId"
          )
        ).as("array")
      )
      .withColumn("indicator", expr(transformExpr))

But I get an error in the expr(transformExpr), the value of transformExpr I'm trying to execute is :

val transformExpr = "transform(array, x -> array_contains(x, 'CIBASE') and array_contains(x, 'ACTIVE'))"

But I get the error saying that the input '>' is unexpected in the -> operator above.

bellow is a screen shot of the console log: enter image description here

the version of scala I'm using is 2.11.8 and the spark version is 3.17.2


Solution

  • You named one of your columns array, but array is also a built-in function in Spark SQL.

    Just rename your column to something else and your code will work:

    val transformExpr = "transform(ar, x -> array_contains(x, 'CIBASE') and array_contains(x, 'ACTIVE'))"
    
    finvInventoryAllDf
          .groupBy("Site_siteId")
          .agg(
            collect_set(
              array(
                "InstalledOffer_applicationSource",
                "InstalledOffer_standardStatus", "InstalledOffer_installedOfferId"
              )
            ).as("ar")
          )
          .withColumn("indicator", expr(transformExpr))