palantir-foundry foundry-code-repositories

Code Repository - What exactly is CTX in pyspark for a code repo?

I have seen the use of ctx in a code repo, what exactly is this? Is it a built in library? When would I use it?

I've seen it in examples such as the following:

df = ctx.spark.createdataframe(...

Solution

For Code Repositories transformations, you can optionally include a parameter ctx which gives you more access to the underlying infrastructure running your job. Typically, you'll access the ctx.spark_session attribute for making your own pyspark.sql.Dataframe objects from Python objects, like:

from transforms.api import transform_df, Output
from pyspark.sql import types as T

@transform_df(
  Output=("/my/output")
)
def my_compute_function(ctx):

   schema = T.StructType(
     [
       T.StructField("name", T.StringType(), True)
     ]
   )
   return ctx.spark_session.createDataFrame([["Alex"]], schema=schema)

You'll find a full API description in documentation on the transforms.api.TransformContext class, where attributes such as the spark_session and parameters are available for you to read.

Note: the spark_session attribute has type pyspark.sql.SparkSession