Search code examples
dataframeapache-sparkpyspark

How to convert a column containing sequence of numbers into sequence of alphabets in Pyspark?


How to convert a column containing sequence of numbers into another column with sequence of alphabets against it e.g., 0 as a, 1 as b likewise. Since numbers would range from 0-9 so does alphabets from a-j respectively. Please help.

Input

ssn_num
1342
4321
2133

expected output

ssn_num alpha_op
1342 bdec
4321 edcb
2133 cbdd

Solution

  • Split the column then transform each digit to an ascii code point in the range (97-122) and finally convert the ascii code back to a character.

    df = df.withColumn('alpha_op', F.expr("array_join(transform(split(ssn_num, ''), x -> char(97 + cast(x as integer))), '')"))
    

    +-------+--------+
    |ssn_num|alpha_op|
    +-------+--------+
    |   1342|    bdec|
    |   4321|    edcb|
    |   2133|    cbdd|
    +-------+--------+