Search code examples
pythonapache-sparkpysparkapache-spark-sqldatabricks

How to call aes_encrypt (and other Spark SQL functions) in a pyspark DataFrame context


I need to call the new Spark function aes_encrypt in a DataFrame context.

The function can be called in a SQL context like this:

SELECT *, aes_encrypt(col1, key, 'GCM') AS col1_encrypted FROM myTable

or like this:

df = sql("SELECT *, aes_encrypt(col1, key, 'GCM') AS col1_encrypted FROM myTable")

Is there any other way to call it in a DataFrame context, something like this?

from pyspark.sql.functions import aes_encrypt

df = table("myTable").withColumn("col1_encrypted", aes_encrypt("col1", key, 'GCM')

(I know it can't be imported since it doesn't exist in pyspark, this is just an example of other Spark functions that can be called)


Solution

  • You can use expr function (doc) for that - just pass corresponding SQL expression:

    df = table("myTable") \
      .withColumn("col1_encrypted", expr("aes_encrypt(col1, key, 'GCM')"))
    

    Another alternative is selectExpr (doc):

    df = table("myTable") \
      .selectExpr("*", "aes_encrypt(col1, key, 'GCM') as col1_encrypted")