Search code examples
pysparkpalantir-foundryfoundry-code-repositoriesincremental-build

Error: "Differing start transactions for incrementality" when running incremental transform in Palantir Foundry


I'm using a Fusion sheet to create a dummy dataset and want to run a transform in incremental mode, which takes this dummy dataset as input. When I append a row manually to the dataset and re-run the transform, I would expect an incremental transform, but it runs in SNAPSHOT mode and throws the below error. I need this transform to always be truly incremental because I assign unique ID's here.

transforms._errors.RequiredIncrementalTransform: View start transactions differ for input dataset ri.foundry.main.dataset....-e54b44db2243. Was ri.foundry.main.transaction....-b9b7d303518c, now ri.foundry.main.transaction....-7395d1f42b71

How can I solve this error?


Solution

  • Foundry expects your input to be incremental, but by using Fusion sheets you will re-write the dummy dataset each time you update your data, creating a non-incremental snapshot. Therefore, you must set the argument of the @incremental decorator:

    snapshot_inputs=['your_input_variable']
    

    So for example:

    @incremental(semantic_version=1, require_incremental=True, snapshot_inputs=['your_input_variable'])
    

    With this setting, you can arbitrarily change your input and input schema and will always get the current input, i.e. your dummy dataset to play around with.

    By the way, the same happens if you write your dataset not with Fusion but with another transform, using ctx.spark_session.createDataFrame().

    Read more on snapshot inputs in the Foundry docs.