I have seen the use of ctx in a code repo, what exactly is this? Is it a built in library? When would I use it?
I've seen it in examples such as the following:
df = ctx.spark.createdataframe(...
For Code Repositories transformations, you can optionally include a parameter ctx
which gives you more access to the underlying infrastructure running your job. Typically, you'll access the ctx.spark_session
attribute for making your own pyspark.sql.Dataframe
objects from Python objects, like:
from transforms.api import transform_df, Output
from pyspark.sql import types as T
@transform_df(
Output=("/my/output")
)
def my_compute_function(ctx):
schema = T.StructType(
[
T.StructField("name", T.StringType(), True)
]
)
return ctx.spark_session.createDataFrame([["Alex"]], schema=schema)
You'll find a full API description in documentation on the transforms.api.TransformContext
class, where attributes such as the spark_session
and parameters
are available for you to read.
Note: the spark_session
attribute has type pyspark.sql.SparkSession