Search code examples
dataframehiveapache-spark-sqlhql

Remove leading and trailing chacters for hive column


I have a hive column which has unknow numbers of leading and trailing double quotes in column.The column has double quotes inside data as well.

For example column looks like this

enter image description here

I want output like below

enter image description here

I have written a pyspark code where I can just strip the " and it works,but I want a solution in hql. I have also tried regexp_replace like

regexp_replace(test,'^"|^""|""$|"$', "")

but this is kind of hardcoding. Can someone please provide a generic solution for this?


Solution

  • Try this-

    val df = spark.sql("select '\"\"\"56\"7\"' as test")
        df.show(false)
        /**
          * +--------+
          * |test    |
          * +--------+
          * |"""56"7"|
          * +--------+
          */
        df.createOrReplaceTempView("table")
        spark.sql("select test, regexp_replace(test, '^\"+|\"+$', '') as test_new from table")
          .show(false)
    
        /**
          * +--------+--------+
          * |test    |test_new|
          * +--------+--------+
          * |"""56"7"|56"7    |
          * +--------+--------+
          */