Search code examples
scalaapache-sparkdataframesubstringstring-length

use length function in substring in spark


I am trying to use the length function inside a substring function in a DataFrame but it gives error

val substrDF = testDF.withColumn("newcol", substring($"col", 1, length($"col")-1))

below is the error

 error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Int

I am using 2.1.


Solution

  • Function "expr" can be used:

    val data = List("first", "second", "third")
    val df = sparkContext.parallelize(data).toDF("value")
    val result = df.withColumn("cutted", expr("substring(value, 1, length(value)-1)"))
    result.show(false)
    

    output:

    +------+------+
    |value |cutted|
    +------+------+
    |first |firs  |
    |second|secon |
    |third |thir  |
    +------+------+