dataframe scala apache-spark apache-spark-sql

How to trim the column values using Spark Dataframe

I have a dataframe like below, and I need to trim the values in SCHDULE column using spark dataframe. I tried with UDF functions but I didn't get expected output

SCHDULE	ID	VALUE
100H/10AR1	KL01	30
100H/10TR2	KL01	40
100H/22TR1	KL01	20
100H/22TR2	KL01	20
105JK/12PK1	AA05	10
105JK/12PK2	AA05	20
105JH/33PK3	AA05	50
105JH/33PK4	AA05	30
110P/1	BR03	20
110P/2	BR03	10

I need output like the below dataframe, can anyone pls help me on this

SCHDULE	ID	VALUE
100H/10AR1	KL01	30
100H/10TR2	KL01	40
100H/22TR1	KL01	20
100H/22TR2	KL01	20
105JK/12PK1	AA05	10
105JK/12PK2	AA05	20
105JH/33PK3	AA05	50
105JH/33PK4	AA05	30
110P/1	BR03	20
110P/2	BR03	10

Solution

Probably you dont need udf here, use function from Spark API, in this case regexp_extract may be usefull, below you can find sample code and regexp

import org.apache.spark.sql.functions._

val inputData = Seq(
  "100H/10AR1",
  "105J/33PK4",
  "110P/1"
)

val inputDf = inputData.toDF("SCHDULE")
inputDf.withColumn("Trimmed", regexp_extract($"SCHDULE","""^(\d+[A-Z]?\/\d+).*""",1)).show

Output:

+----------+-------+
|   SCHDULE|Trimmed|
+----------+-------+
|100H/10AR1|100H/10|
|105J/33PK4|105J/33|
|    110P/1| 110P/1|
+----------+-------+