Search code examples
pysparkontologypalantir-foundryfoundry-code-repositories

Palantir-Foundry - Use 'context' in a python transform fails


I would like to compare data coming from a fusion sheet with data integrated in an object type; while users are able to update and create data in Object Explorer, I need to compare with the writeback dataset's data.

The writeback dataset can't be an input in a transform where the output is the related backing dataset: it raises a circular dependency error.
So I tried to read the writeback dataset from the context, using something like this:

@transform(
    my_fusion=Input("fusion_dataset_path"),
    my_output=Output("backing_dataset_path")
)
def my_compute_function(my_input, my_output, ctx):
    my_writeback = ctx._foundry.input("write_back_dataset_path", branch="master")
    my_df = ... # All controls come here
    my_output.write_dataframe(my_df) 

The problem is that it's not accepted because of an error:

ValueError: Could not find resource with path write_back_dataset_path

I do not understand why it fails in this case, while I already used this kind of syntax for other transforms.
Thanks in advance for your help.


Solution

  • After several searches and attempts, I went to the conclusion that this is just NOT possible.

    There are 2 ways to involve a dataset in a transform: either declare it as input or output, or open it as part of the context. Neither option is allowed:

    • A writeback dataset cannot be used as an input if it's related backing dataset is the output, it raises a circular dependency error. And this is checked at repository level, which means you can't use intermediate datasets. In this case, setting it as an output does not make sens
    • A writeback dataset is not part of a repository context, because the repository is not the owner: a writeback is managed by Ontology process, Ontology is 'responsible' for building it.

    I finally found a workaround for my problem: the validation process is run twice: a first time it leads to a 'technical' dataset that the user do not see, where all updates are applied. Then a second time to build the output dataset, that the user sees. He applies changes on it and they are replicated on the first one, to allow this second validation process to run again and validated the edited data. All of this is based on action types and Workshop for the user interface.