Search code examples
azureazure-sql-databaseazure-machine-learning-serviceazure-sdk-python

Azure Machine Learning Studio: cannot create Datastore from Azure SQL Database


I am trying to connect to an Azure SQL Database from inside Azure Machine Learning Studio. Based on https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py, it seems that the recommended pattern is to create a Datastore using the Datastore.register_azure_sql_database method as follows:

import os
from azureml.core import Workspace, Datastore

ws = Workspace.from_config() # asks for interactive authentication the first time

sql_datastore_name  = "datastore_test_01" # any name should be fine
server_name         = os.getenv("SQL_SERVERNAME"    , "{SQL_SERVERNAME}") # Name of the Azure SQL server
database_name       = os.getenv("SQL_DATABASENAME"  , "{SQL_DATABASENAME}") # Name of the Azure SQL database
username            = os.getenv("SQL_USER_NAME"     , "{SQL_USER_NAME}") # The username of the database user.
password            = os.getenv("SQL_USER_PASSWORD" , "{SQL_USER_PASSWORD}") # The password of the database user.

sql_datastore = Datastore.register_azure_sql_database(workspace      = ws,
                                                      datastore_name = sql_datastore_name,
                                                      server_name    = server_name,
                                                      database_name  = database_name,
                                                      username       = username,
                                                      password       = password)

I am pretty sure I have set all parameters right, having copied them from the ADO.NET connection string at my SQL Database resource --> Settings --> Connection strings:

Server=tcp:{SQL_SERVERNAME}.database.windows.net,1433;Initial Catalog={SQL_DATABASENAME};Persist Security Info=False;User ID={SQL_USER_NAME};Password={SQL_USER_PASSWORD};MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;

However, I get the following error:

Registering datastore failed with a 400 error code and error message 'Azure SQL Database Error -2146232060: Please check the correctness of the datastore information.'

Am I missing something? E.g., a firewall rule? I have also tried adding the Azure ML compute resource's public IP address to the list of allowed IP addresses in my SQL Database resource, but still no success.


UPDATE: adding skip_validation = True to Datastore.register_azure_sql_database solves the issue. I can then query the data with

from azureml.core import Dataset
from azureml.data.datapath import DataPath

query   = DataPath(sql_datastore, 'SELECT * FROM my_table')
tabular = Dataset.Tabular.from_sql_query(query, query_timeout = 10)
df = tabular.to_pandas_dataframe()

Solution

  • is the datastore behind vnet? where are you running the registration code above? On a compute instance behind the same vnet? here is the doc that describe what you need to do to connect to data behind vnet: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-enable-virtual-network#use-datastores-and-datasets