Search code examples
pythonsqlazurejupyter-notebookazure-machine-learning-service

Can I import data from On-Premises SQL Server Database to Azure Machine Learning virtual machine?


On the limited Azure Machine Learning Studio, one can import data from an On-Premises SQL Server Database. What about the ability to do the exact same thing on a python jupyter notebook on a virtual machine from the Azure Machine Learning Services workspace ?

It does not seem possible from what I've found in the documentation. Data sources would be limited in Azure ML Services : "Currently, the list of supported Azure storage services that can be registered as datastores are Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System"

Thank you in advance for your assistance


Solution

  • As of today, you can load SQL data, but only a MS SQL Server source (also on-premise) is supported.

    Using azureml.dataprep, code would read along the lines of

    import azureml.dataprep as dprep
    
    secret = dprep.register_secret(value="[SECRET-PASSWORD]", id="[SECRET-ID]")
    
    ds = dprep.MSSQLDataSource(server_name="[SERVER-NAME]",
                               database_name="[DATABASE-NAME]",
                               user_name="[DATABASE-USERNAME]",
                               password=secret)
    
    dflow = dprep.read_sql(ds, "SELECT top 100 * FROM [YourDB].[ATable]")
    # print first records
    dflow.head(5)
    

    As far as I understand the APIs are under heavy development and azureml.dataprep may be soon superseded by functionality provided by the Dataset class.