Search code examples
pythonpysparkauto-increment

Ho w to rename column value in incremental manner for string in pyspark


Input dataset:

enter image description here

Expected output dataset:

enter image description here


Solution

  • from pyspark.sql import *
    from pyspark.sql import functions as sf
    
    spark = SparkSession.builder.master("local").appName("test").getOrCreate()
    
    simpleData = (("XYZ",  2), \
                  ("XYZ",  4), \
                  ("XYZ",  10), \
                  ("ABC",  6), \
                  ("ABC",  8), \
                  ("ABC",  18), \
                  ("YYY",  20), \
                  )
    
    columns = ["Product_name", "Price"]
    df = spark.createDataFrame(data=simpleData, schema=columns)
    
    from pyspark.sql.window import Window
    from pyspark.sql.functions import row_number
    windowSpec  = Window.partitionBy("Product_name").orderBy("Price")
    
    ndf=df.withColumn("row_number",row_number().over(windowSpec)) \
        .withColumn("Product_name", sf.concat_ws('_',sf.col("Product_name"),sf.col("row_number")))\
        .drop('row_number').show()
    

    OutPut: enter image description here