Search code examples
pythonpysparkdatabricks

Replace first occurrence of character in spark dataframe pyspark


I know , I am asking very basic question here , But is there any way to replace first occurrence of character within pyspark dataframe.

I have below value within dataframe.

Gourav#Joshi#Karnataka#US#English

I only want to replace first occurrence of # within dataframe.

Expected Output:

Gourav Joshi#Karnataka#US#English

Solution

  • Just use regexp_replace and capture the sub-string before the 1st # as $1:

    spark.sql("""
        select col, regexp_replace(col,'^([^#]*)#','$1 ') col_new
        from values ('Gourav#Joshi#Karnataka#US#English') as (col)
    """).show(1,0)
    +---------------------------------+---------------------------------+
    |col                              |col_new                          |
    +---------------------------------+---------------------------------+
    |Gourav#Joshi#Karnataka#US#English|Gourav Joshi#Karnataka#US#English|
    +---------------------------------+---------------------------------+