Search code examples
pyspark

String type order change and remove a specific character using Pyspark


let's say I have a column like the below

Date
03/2024
07/2024
12/2024
06/2024
01/2024

but I want to change the string order and remove a specific character in the middle

Date
202403
202407
202412
202406
202401

Please help me!


Solution

  • If it is not normal Date then you can deal with it as string

    import pyspark.sql.functions as f
    from pyspark.sql.types import StringType
    
    df = spark.createDataFrame(
        [("03/2024"),
        ("07/2024"),
        ("12/2024"),
        ("06/2024"),
        ("01/2024")],
        StringType())
    
    df = df.withColumn("Split", f.split(df.value,"/"))
    df.withColumn("Ordered", f.concat(df.Split[1], df.Split[0])).show()
    
    +-------+----------+-------+
    |  value|     Split|Ordered|
    +-------+----------+-------+
    |03/2024|[03, 2024]| 202403|
    |07/2024|[07, 2024]| 202407|
    |12/2024|[12, 2024]| 202412|
    |06/2024|[06, 2024]| 202406|
    |01/2024|[01, 2024]| 202401|
    +-------+----------+-------+