palantir-foundry foundry-code-repositories

How can I create an empty dataset from on a PySpark schema in Palantir Foundry?

I have a PySpark schema that describes columns and their types for a dataset (which I could write by hand, or get from an existing dataset by going to the 'Columns' tab, then 'Copy PySpark schema').

I want an empty dataset with this schema, for example that could be used as a backing dataset for a writeback-only ontology object. How can I create this in Foundry?

Solution

To do this in Python, you can create an empty dataset by using the Spark session from the context to create a DataFrame with the schema, for example:

from pyspark.sql import types as T
from transforms.api import transform_df, configure, Output

SCHEMA = T.StructType([
    T.StructField('entity_name', T.StringType()),
    T.StructField('thing_value', T.IntegerType()),
    T.StructField('created_at', T.TimestampType()),
])


# Given there is no work to do, save on compute by running it on the driver
@configure(profile=["KUBERNETES_NO_EXECUTORS_SMALL"])
@transform_df(
    Output("/some/dataset/path/or/rid"),
)
def compute(ctx):
    return ctx.spark_session.createDataFrame([], schema=SCHEMA)