Search code examples
microsoft-fabric

Load data to MS Fabric Warehouse from notebook


I want to load data into a MS Fabric Warehouse using a python notebook, without creating a pipeline / dataflow. That is, to connect using python and issue SQL queries from there.

I tried to connect using pyodbc, following the article https://debruyn.dev/2023/connect-to-fabric-lakehouses-warehouses-from-python-code/, but the authentication doesn't seem to work (interactive couldn't find the browser and CLI couldn't find the Azure CLI).

Ideally, I'd like a generic solution where the DB name / workspace would be set up through code (working on a library).


Solution

  • The problem with that article is that is uses an interactive auth flow. You need to use a Service Prinicpal to connect. So provision an AppRegistration in Entra ID, and grant it access to your DW. Then use that to connect.

    And there's no need to handle the AccessToken directly, the ODBC driver will do that for you. eg:

    import pyodbc
    import pandas as pd
              
    server = "x6e...dedicated.windows.net"
    database="YourDatabase"
    clientId = "91b...ec5"
    clientSecret = mssparkutils.credentials.getSecret("https://YourKeyVault.vault.azure.net/","your-client-secret-secret-name")
    sql = "select * from sys.objects"
    
    constr = f"driver=ODBC Driver 18 for SQL Server;server={server};database={database};UID={clientId};PWD={clientSecret};Authentication=ActiveDirectoryServicePrincipal;Encrypt=yes;Timeout=60;"
    
    con = pyodbc.connect(constr)
    
    data = pd.read_sql(sql,con)
    
    print(data)
    

    To load any non-trivial amount of data, you’ll want to write it to a lakehouse as parquet or delta and have the DW load it from there. See

    Ingest data into your Warehouse using Transact-SQL, eg write the table to lakehouse and run:

    CREATE TABLE [dbo].[my_warehouse_table]
    AS
    SELECT *
    FROM [my_lakehouse].[dbo].[my_staging_table]