Search code examples
apiapache-sparksplitapache-spark-sqlcol

How to access data in a Spark Dataset Column


I have a Dataframe like this:

+------+---+
|  Name|Age|
+------+---+
|A-2   | 26|
|B-1   | 30|
|C-3   | 20|
+------+---+

scala> p.select("Name", "Age")
res2: org.apache.spark.sql.DataFrame = [Name: string, Age: string]

We can see clearly here that the data in the columns are of type String

I want to transform the Name column with a split("-") like method to get only the first part of it (i.e A, B, C). But type Column in spark doesn't have such a method, so i'm thinking how to get the 'string' inside of the Column so i can perform the split operation.

Does anyone know what should i do ?


Solution

  • Use functions.split method

    df.select(split(col("Name"), "-").getItem(0))