manipulating string if string starts with specific characters pyspark

I have this dataframe with a column of strings:

Column A
AB-001-1-12345-A
AB-001-1-12346-B
ABC012345B
ABC012346B

In PySpark, I want to create a new column where if there is "AB-" in front, the new column remove the characters "AB-" and keep the rest of the characters. Otherwise, the strings should remain the same.

Expected Output:

Column A	Column B
AB-001-1-12345-A	001-1-12345-A
AB-001-1-12346-B	001-1-12346-B
ABC012345B	ABC012345B
ABC012346B	ABC012346B

Solution

Hope this works for you

from pyspark.sql.functions import *
df = df.withColumn("col_b",when(col("col_a").startswith("AB-") , split(col("col_a"),"AB-").getItem(1)).otherwise(col("col_a")))
df.show()

Output

+----------------+-------------+
|           col_a|        col_b|
+----------------+-------------+
|AB-001-1-12345-A|001-1-12345-A|
|AB-001-1-12346-B|001-1-12346-B|
|      ABC012345B|   ABC012345B|
|      ABC012346B|   ABC012346B|
+----------------+-------------+