Search code examples
azure-databricksdata-warehouseazure-synapse

DBFS FileStore Equivalent in Azure Synapse?


Is there an equivalent to Databricks' DBFS FileStore system in Azure Synapse? Is it possible to upload csv files and read them into pandas dataframes within Azure Synapse notebooks? Ideally I'd like to not load the csv into a database; looking for something as simple as DBFS' FileStore folder.

In Databricks: pd.read_csv('/dbfs/FileStore/name_of_file.csv')

In Synapse: ?

I don't see anywhere to upload csv files directly like in DBFS:

enter image description here


Solution

  • The azure synapse equivalent of using FileStore in Databricks would be to use the data lake file system linked to your synapse workspace. Once you go to your synapse studio, navigate to Data->Linked where you can find the linked storage account. This storage account was created/assigned when you create your workspace.

    enter image description here

    This primary data lake functions close to the FileStore in azure Databricks. You can use the UI shown in the above image to upload required files. You can right click on any of the files and load it into a Dataframe. As you can see in the image below, you can right click on the file and then choose new notebook -> Load to DataFrame.

    enter image description here

    The UI automatically provides a code which helps to load the csv file to a spark Dataframe. You can modify this code to load the file as a pandas Dataframe.

    '''
    #this is provided by synapse when you select file and choose to load to Dataframe
    
    df = spark.read.load('abfss://[email protected]/sample_1.csv', format='csv'
    ## If header exists uncomment line below
    ##, header=True
    )
    display(df.limit(10))
    '''
    #Use this following code to load as pandas dataframe
    
    import pandas as pd 
    df = pd.read_csv('abfss://[email protected]/sample_1.csv')
    

    This data lake storage will be linked to the workspace with the help of the linked service (Can be viewed in Manage->Linked services). This is created by default from the data lake and file system information provided by the user (mandatory) while creating the synapse workspace.