Search code examples
scalaapache-sparkdataframeudf

String permutation spark udf


i'm convering a pig script to spark 1.6 using scala, i have a dataframe which contains a string, and i want to swap characters in a certain order.
example :

+----------------+
|            Info|
+----------------+
|8106f510000dc502|
+----------------+

i want to convert it like this order [3,1,5,7,6,(8-16),4,2]

+----------------+
|            Info|
+----------------+
|08f150000dc50241|
+----------------+

This is my pig UDF with java and it's working:

public class NormalizeLocInfo extends EvalFunc<String>
{
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            char [] ca = ((String)input.get(0)).toCharArray();
            return (
                    new StringBuilder().append(ca[3]).append(ca[0]).append(ca[5]).append(ca[7]).append(ca[6]).append(ca[8]).append(ca[9]).append(ca[10])
               .append(ca[11]).append(ca[12]).append(ca[13]).append(ca[14]).append(ca[15]).append(ca[16]).append(ca[4]).toString().toUpperCase()
               );
        }catch(Exception e){throw new IOException("UDF:Caught exception processing input row :"+input.get(0), e);}
    }
  }

How i can change it to spark udf using scala ? Thank ou


Solution

  • This is how you can define a UDF function in spark for your function

       import org.apache.spark.sql.functions._
    
        val exec = udf((input : String) => {
          if (input == null || input.trim == "") ""
          else {
            Try{
              val ca = input.toCharArray
              List(3,1,5,7,6,9,10,11,12,13,14,15,16,4,2).map(a=>ca(a-1)).mkString
            } match{
              case Success(data) => data
              case Failure(e)  =>
                println(e.printStackTrace())
                ""
            }
          }
        })
    

    You can use the function with withColumn() as

    val dfNew = df.withColumn("newCol", exec($"oldCol"))
    

    Hope this helps