Search code examples
azureazure-sql-databaseazure-machine-learning-serviceazureml-python-sdk

Parameterized SQL query in Azure ML


Background: There seems to be a way to parameterize DataPath with PipelineParameter https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb

But I'd like to parameterize my SQL query with PipelineParameter, for example, with this query

sql_query = """
SELECT id, foo, bar FROM baz
WHERE baz.id BETWEEN 10 AND 20
"""
dataset = Dataset.Tabular.from_sql_query((sql_datastore, sql_query))

I'd like to use PipelineParameter to parameterize 10 and 20 as param_1 and param_2. Is this possible?


Solution

  • Found a way to do this:

    Pass your params to PythonScriptStep

    param_1 = PipelineParameter(name='min_id', default_value=5)
    param_2 = PipelineParameter(name='max_id', default_value=10)
    sql_datastore = "sql_datastore"
    step = PythonScriptStep(script_name='script.py', arguments=[param_1, param_2, 
    sql_datastore])
    

    In script.py

    min_id_param = sys.argv[1]
    max_id_param = sys.argv[2]
    sql_datastore_name = sys.argv[3]
    query = """
    SELECT id, foo, bar FROM baz
    WHERE baz.id BETWEEN {} AND {}
    """.format(min_id_param, max_id_param)
    run = Run.get_context()
    sql_datastore = Datastore.get(run.experiment.workspace, sql_datastore_name)
    dataset = Dataset.Tabular.from_sql_query((sql_datastore, query))