Search code examples
azurepysparkdatabricksazure-databricksazure-synapse

How to convert Azure Synapse Dataframe into JSON on Databricks?


Can I convert my Azure Synapse Dataframe into JSON ? Because when I tried it, it got an error. I'm using a script as Pandas DataFrame function df.to_json(), because I assume that Azure Synapse Dataframe same as Pandas DataFrame.

So here are the script for my synapse:

class UtilAzSynapse(UtilAzSynapse):
    @staticmethod
    def write_to_synapse(df, table, write_mode, url, tempDir):
        log_msg = {
            "table": table,
            "url": url,
            "tempDir": tempDir
        }
        UtilInfo.pnt("UtilAzSynapse.write_to_synapse log:\n" +
                     json.dumps(log_msg, indent=4))
        
        (df.write
          .format("com.databricks.spark.sqldw") # Commented at 20200121 Sql dw connetion exception (email keyword: Databricks cannot access the DW)
#         .format("jdbc") # Added at 20200121
          .option("tableOptions", "CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN") # Added at 20200121
          .option("url", url)
          .option("dbtable", table)
          .option("forward_spark_azure_storage_credentials","True")
          .option("tempdir", tempDir)         
          .mode(write_mode)
          .save()
        )

And this is when I select my table

temp_write_dir = azBlob.get_blob_path(
    container = '03-analyse',
    folder_path = f"{params['working_dir']}/sqlDwWriteTempDirs"
)
print(f"temp_write_dir = {temp_write_dir}")
df_dim_store = azSynapse._read_from_synapse(fact_sales_sql, tempDir=temp_read_dir)
df_dim_store = df_dim_store.to_json()

Error:

AttributeError: 'DataFrame' object has no attribute 'to_json'

Why I need to convert my DataFrame into JSON is caused by when I try using my write_to_synapse function it was explained that the DataFrame need to converted into JSON format.


Solution

  • A pyspark dataframe is not the same thing as a pandas dataframe.

    In pyspark you should be able to do:

    df.toJSON()
    

    You can find more information here: pyspark.sql.DataFrame.toJSON