Search code examples
apache-sparkrowsusinggenerate

How do I Generate rows depending upon the column value in spark?


suppose If I have single column with one row

+---+
| id|
+---+
|  4|
+---+

then how do i generate rows depending up on the value of a column

+---+
| id|
+---+
| 1 |
|---|
| 2 |
|---|
| 3 |
|---|
| 4 |
+---+

Solution

  • You can define a udf function for that to generate range and then use explode function to make them to separate rows

    import org.apache.spark.sql.functions._
    def generateUdf = udf((column: Int)=> (1 to column).toArray)
    
    df.withColumn("id", explode(generateUdf(col("id")))).show(false)
    

    which should give you

    +---+
    |id |
    +---+
    |1  |
    |2  |
    |3  |
    |4  |
    +---+