I have a PySpark schema that describes columns and their types for a dataset (which I could write by hand, or get from an existing dataset by going to the 'Columns' tab, then 'Copy PySpark schema').
I want an empty dataset with this schema, for example that could be used as a backing dataset for a writeback-only ontology object. How can I create this in Foundry?
To do this in Python, you can create an empty dataset by using the Spark session from the context to create a DataFrame with the schema, for example:
from pyspark.sql import types as T
from transforms.api import transform_df, configure, Output
SCHEMA = T.StructType([
T.StructField('entity_name', T.StringType()),
T.StructField('thing_value', T.IntegerType()),
T.StructField('created_at', T.TimestampType()),
])
# Given there is no work to do, save on compute by running it on the driver
@configure(profile=["KUBERNETES_NO_EXECUTORS_SMALL"])
@transform_df(
Output("/some/dataset/path/or/rid"),
)
def compute(ctx):
return ctx.spark_session.createDataFrame([], schema=SCHEMA)