Search code examples
pythonpandasazureanacondaazure-machine-learning-service

Updating pandas to version 0.19 in Azure ML Studio


I would really like to get access to some of the updated functions in pandas 0.19, but Azure ML studio uses pandas 0.18 as part of the Anaconda 4.0 bundle. Is there a way to update the version that is used within the "Execute Python Script" components?


Solution

  • I offer the below steps for you to show how to update the version of pandas library in Execute Python Script.

    Step 1 : Use the virtualenv component to create an independent python runtime environment in your system.Please install it first with command pip install virtualenv if you don't have it.

    If you installed it successfully ,you could see it in your python/Scripts file.

    enter image description here

    Step2 : Run the commad to create independent python runtime environment.

    enter image description here

    Step 3 : Then go into the created directory's Scripts folder and activate it (this step is important , don't miss it)

    Please don't close this command window and use pip install pandas==0.19 to download external libraries in this command window.

    enter image description here

    Step 4 : Compress all of the files in the Lib/site-packages folder into a zip package (I'm calling it pandas - package here)

    enter image description here

    Step 5 :Upload the zip package into the Azure Machine Learning WorkSpace DataSet.

    enter image description here

    specific steps please refer to the Technical Notes.

    After success, you will see the uploaded package in the DataSet List

    enter image description here

    Step 6 : Before the defination of method azureml_main in the Execute Python Script module, you need to remove the old pandas modules & its dependencies, then to import pandas again, as the code below.

    import sys
    import pandas as pd
    print(pd.__version__)
    del sys.modules['pandas']
    del sys.modules['numpy']
    del sys.modules['pytz']
    del sys.modules['six']
    del sys.modules['dateutil']
    sys.path.insert(0, '.\\Script Bundle')
    for td in [m for m in sys.modules if m.startswith('pandas.') or m.startswith('numpy.') or m.startswith('pytz.') or m.startswith('dateutil.') or m.startswith('six.')]:
        del sys.modules[td]
    import pandas as pd
    print(pd.__version__)
    # The entry point function can contain up to two input arguments:
    #   Param<dataframe1>: a pandas.DataFrame
    #   Param<dataframe2>: a pandas.DataFrame
    def azureml_main(dataframe1 = None, dataframe2 = None):
    

    Then you can see the result from logs as below, first print the old version 0.14.0, then print the new version 0.19.0 from the uploaded zip file.

    [Information]         0.14.0
    [Information]         0.19.0
    

    You could also refer to these threads: Access blob file using time stamp in Azure and reload with reset.

    Hope it helps you.