Search code examples
dataframescalaapache-sparkapache-spark-sqlcasting

Spark concatenating strings using withColumn()


So I have the given dataframe:

+--------------------+-------------+
|           entity_id|        state|
+--------------------+-------------+
|             ha_tdeg|         39.9|
|         memory_free|       1459.4|
|            srv_tdeg|         39.0|
|       as_tempera...|          9.5|
|         as_humidity|        81.71|
|         as_pressure|      1003.35|
|      as_am_humidity|        22.16|
|      as_pm_humidity|         4.64|
|         memory_free|       1460.0|
|             ha_tdeg|         38.0|
|         memory_free|       1459.3|
+--------------------+-------------+

Im trying to add a percentage sign to every "state" where "entity_id" contains 'humidity'. So as it's seen in the code below, I set the "state" column to "String" before I work with it. But whenever I execute the command below and try to concatenate '%' (or any other string), all the values become "null". Interesting for me is that if I try to concatenate a number wraped as String ("10"), It performs a math's addition.

What is the way to overcome this issue?

Here is the code im using:

var humidityDF = df.filter(forExport("entity_id").contains("humidity") && df("state").isNotNull)
humidityDF = humidityDF.withColumn("state", humidityDF("state").cast("String"))
humidityDF = humidityDF.withColumn("state", col("state") + "%")

I tried :

humidityDF = humidityDF.withColumn("state", col("state").toString + "%")

But this doesn't work since 'withColumn' accepts only Column type parameters.


Solution

  • import org.apache.spark.sql.functions.{lit, concat}
    humidityDF = humidityDF.withColumn("state", concat(col("state"),lit("%")))