Search code examples
dictionarypysparkdatasetpalantir-foundryfoundry-code-repositories

How do I transform the data set into a dictionary inside the repo. I am using pyspark within foundry


I created a fusion sheet data to be synced to the data set. now, I want to use that data set for creating a dictionary in the repo. I am using pyspark in the repo. later I want to use that dictionary to be passed so that it populates descriptions as it is in Is there a tool available within Foundry that can automatically populate column descriptions? If so, what is it called?.

it would great if anyone can help me creating the dictionary from data set using pyspark in the repo.


Solution

  • The following code would convert your pyspark dataframe into a list of dictionaries:

    fusion_rows = map(lambda row: row.asDict(), fusion_df.collect())
    

    However, in your particular case, you can use the following snippet:

    col_descriptions = {row["column_name"]: row["description"] for row in fusion_df.collect()}
    my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions=col_descriptions
    )
    

    Assuming your Fusion sheet would look like this:

    +------------+------------------+
    | column_name|       description|
    +------------+------------------+
    |       col_A| description for A|
    |       col_B| description for B|
    +------------+------------------+