I have a dataset that is the output of a Python transform defined in Palantir Foundry Code Repository. It has a primary key, but given that over time the data may change I want to validate this primary key holds in the future.
How can I create a data health expectation or check to ensure the primary key holds in future?
You can define data expectations in your Python transform, for example:
from transforms.api import transform_df, Input, Output, Check
from transforms import expectations as E
@transform_df(
Output("/path/to/output", checks=[
Check(E.primary_key("thing_id"), "primary_key: thing_id"),
]),
source_df=Input("/path/to/input"),
)
def compute(source_df):
return source_df.select("thing_id", "thing_name").distinct()
More information is available in the Palantir Foundry documentation on defining data expectations.