Search code examples
palantir-foundryfoundry-code-repositories

Creating a primary key data health expectation in Palantir Foundry Code Repositories


I have a dataset that is the output of a Python transform defined in Palantir Foundry Code Repository. It has a primary key, but given that over time the data may change I want to validate this primary key holds in the future.

How can I create a data health expectation or check to ensure the primary key holds in future?


Solution

  • You can define data expectations in your Python transform, for example:

    from transforms.api import transform_df, Input, Output, Check
    from transforms import expectations as E
    
    
    @transform_df(
        Output("/path/to/output", checks=[
            Check(E.primary_key("thing_id"), "primary_key: thing_id"),
        ]),
        source_df=Input("/path/to/input"),
    )
    def compute(source_df):
        return source_df.select("thing_id", "thing_name").distinct()
    

    More information is available in the Palantir Foundry documentation on defining data expectations.