Search code examples
palantir-foundryfoundry-data-connection

In Palantir Foundry's Data Connection tool, what's the difference between the transaction type options?


When setting up a file-based sync in Data Connection, I see there are a few different options for 'Transaction Type'. What's the difference between them? When might I use them?


Solution

  • From the Foundry docs:


    Transaction types

    The way dataset files are modified in a transaction depends on the transaction type. There are four possible transaction types: SNAPSHOT, APPEND, UPDATE, and DELETE.

    SNAPSHOT

    A SNAPSHOT transaction replaces the current view of the dataset with a completely new set of files.

    SNAPSHOT transactions are the simplest transaction type, and are the basis of batch pipelines.

    APPEND

    An APPEND transaction adds new files to the current dataset view.

    An APPEND transaction cannot modify existing files in the current dataset view. If an APPEND transaction is opened and existing files are overwritten, then attempting to commit the transaction will fail.

    APPEND transactions are the basis of incremental pipelines. By only syncing new data into Foundry and only processing this new data throughout the pipeline, changes to large datasets can be processed end-to-end in a performant way. However, building and maintaining incremental pipelines comes with additional complexity. Learn more about incremental pipelines.

    UPDATE

    An UPDATE transaction, like an APPEND, adds new files to a dataset view, but may also overwrite the contents of existing files.

    DELETE

    A DELETE transaction removes files that are in the current dataset view.

    Note that committing a DELETE transaction does not delete the underlying file from the backing file system—it simply removes the file reference from the dataset view.

    In practice, DELETE transactions are mostly used to enable data retention workflows. By deleting files on a dataset based on a retention policy—typically based on the age of the file—data can be removed from Foundry, both to minimize storage costs and to comply with data governance requirements.


    Data Connection doesn't let you create a sync with a DELETE transaction type, because a sync that purely deletes data doesn't really make sense! If you'd like to delete data from your sync'd dataset, you can use a SNAPSHOT transaction to do so, but note that previous versions of the dataset will still include those files.

    You can combine an APPEND or UPDATE transaction type with file-based sync filters to only ingest the newly changed files on each run of your sync.