Search code examples
apache-sparkapache-spark-sqluuid

Add a new column to a Dataframe. New column i want it to be a UUID generator


I want to add a new column to a Dataframe, a UUID generator.

UUID value will look something like 21534cf7-cff9-482a-a3a8-9e7244240da7

My Research:

I've tried with withColumn method in spark.

val DF2 = DF1.withColumn("newcolname", DF1("existingcolname" + 1)

So DF2 will have additional column with newcolname with 1 added to it in all rows.

By my requirement is that I want to have a new column which can generate the UUID.


Solution

  • You should try something like this:

    val sc: SparkContext = ...
    val sqlContext = new SQLContext(sc)
    
    import sqlContext.implicits._
    
    val generateUUID = udf(() => UUID.randomUUID().toString)
    val df1 = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
    val df2 = df1.withColumn("UUID", generateUUID())
    
    df1.show()
    df2.show()
    

    Output will be:

    +---+-----+
    | id|value|
    +---+-----+
    |id1|    1|
    |id2|    4|
    |id3|    5|
    +---+-----+
    
    +---+-----+--------------------+
    | id|value|                UUID|
    +---+-----+--------------------+
    |id1|    1|f0cfd0e2-fbbe-40f...|
    |id2|    4|ec8db8b9-70db-46f...|
    |id3|    5|e0e91292-1d90-45a...|
    +---+-----+--------------------+