Search code examples
pythondataframepysparkreplace

Replacing dots with commas on a pyspark dataframe


I'm using the code bellow to collect some info:

df = (
  df
  .select(
        date_format(date_trunc('month', col("reference_date")), 'yyyy-MM-dd').alias("month"),
        col("id"),
        col("name"),
        col("item_type"),
        col("sub_group"),
        col("latitude"),
        col("longitude")
  )

My latitude and longitude are values with dots, like this: -30.130307 -51.2060018 but I must replace the dot for a comma. I've tried both .replace() and .regexp_replace() but none of them are working. Could you guys help me please?


Solution

  • With the following dataframe as an example.

    df.show()
    +-------------------+-------------------+                                       
    |           latitude|          longitude|
    +-------------------+-------------------+
    |  85.70708380916193| -68.05674981929877|
    | 57.074495803252404|-42.648691976080215|
    |  2.944303748172473| -62.66186439333423|
    | 119.76923402031701|-114.41179457810185|
    |-138.52573939229234|  54.38429596238362|
    +-------------------+-------------------+
    

    You should be able to use spark.sql functions like the following

    from pyspark.sql import functions
    
    df = df.withColumn("longitude", functions.regexp_replace('longitude',r'[.]',","))
    df = df.withColumn("latitude", functions.regexp_replace('latitude',r'[.]',","))
    df.show()
    +-------------------+-------------------+
    |           latitude|          longitude|
    +-------------------+-------------------+
    |  85,70708380916193| -68,05674981929877|
    | 57,074495803252404|-42,648691976080215|
    |  2,944303748172473| -62,66186439333423|
    | 119,76923402031701|-114,41179457810185|
    |-138,52573939229234|  54,38429596238362|
    +-------------------+-------------------+