Search code examples
palantir-foundryfoundry-code-repositories

Code Repository - What exactly is CTX in pyspark for a code repo?


I have seen the use of ctx in a code repo, what exactly is this? Is it a built in library? When would I use it?

I've seen it in examples such as the following:

df = ctx.spark.createdataframe(...

Solution

  • For Code Repositories transformations, you can optionally include a parameter ctx which gives you more access to the underlying infrastructure running your job. Typically, you'll access the ctx.spark_session attribute for making your own pyspark.sql.Dataframe objects from Python objects, like:

    from transforms.api import transform_df, Output
    from pyspark.sql import types as T
    
    @transform_df(
      Output=("/my/output")
    )
    def my_compute_function(ctx):
    
       schema = T.StructType(
         [
           T.StructField("name", T.StringType(), True)
         ]
       )
       return ctx.spark_session.createDataFrame([["Alex"]], schema=schema)
    
    

    You'll find a full API description in documentation on the transforms.api.TransformContext class, where attributes such as the spark_session and parameters are available for you to read.

    Note: the spark_session attribute has type pyspark.sql.SparkSession