Search code examples
regexstringscalaapache-sparktrim

How to remove quotes from front and end of the string Scala


I have a dataframe where some strings contains "" in front and end of the string.

Eg:

+-------------------------------+
|data                           |
+-------------------------------+
|"john belushi"                 |
|"john mnunjnj"                 |
|"nmnj tyhng"                   |
|"John b-e_lushi"               |
|"john belushi's book"          |

Expected output:

+-------------------------------+
|data                           |
+-------------------------------+
|john belushi                   |
|john mnunjnj                   |
|nmnj tyhng                     |
|John b-e_lushi                 |
|john belushi's book            |

I am trying to remove only " double quotes from the string. Can some one tell me how can I remove this in Scala ?

Python provide ltrim and rtrim. Is there any thing equivalent to that in Scala ?


Solution

  • Use expr, substring and length functions and get the substring from 2 and length() - 2

    val df_d = List("\"john belushi\"", "\"John b-e_lushi\"", "\"john belushi's book\"")
    .toDF("data")
    

    Input:

    +---------------------+
    |data                 |
    +---------------------+
    |"john belushi"       |
    |"John b-e_lushi"     |
    |"john belushi's book"|
    +---------------------+
    

    Using expr, substring and length functions:

    import org.apache.spark.sql.functions.expr
    
    df_d.withColumn("data", expr("substring(data, 2, length(data) - 2)"))
        .show(false)
    

    Output:

    +-------------------+
    |data               |
    +-------------------+
    |john belushi       |
    |John b-e_lushi     |
    |john belushi's book|
    +-------------------+