Search code examples
pythonpandasazureazure-devopsazure-machine-learning-service

Resource Not Found Error: When Reading CSV from Azure Blob using Pandas with its SAS URL


I am trying to perform Dataset versioning where I read a CSV file into a pandas DataFrame and then create a new version of an Azure ML Dataset. I am running the below code in an Azure CLI job within Azure DevOps.

df = pd.read_csv(blob_sas_url)

At this line, I get a 404 Error. Error Message:

urllib.error.HTTPError: HTTP Error 404: The specified resource does not exist

I tried to do this locally, I was able to read the csv file into Dataframe. The SAS URL and token are not expired too.

How to solve this issue?

Edit - Code

def __init__(self, args):
    self.args = args
    self.run = Run.get_context()
    self.workspace = self.run.experiment.workspace

def get_Dataframe(self):

    print(self.args.blob_sas_url)
    df = pd.read_csv(self.args.blob_sas_url)

    return df


def create_pipeline(self):
    print("Creating Pipeline")
    print(self.args.blob_sas_url)

    dataframe = self.dataset_to_update()
    # Rest of Code

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Azure ML Dataset Versioning pipeline')

    parser.add_argument('--blob_sas_url', type=str, help='SAS URL to the Data File in Blob Storage')
    
    args = parser.parse_args()
    ds_versioner = Pipeline(args)
    ds_versioner.create_pipeline()

In both the instances where I print the SAS URL within the script print(self.args.blob_sas_url), the URL is shortened. I was able to see this in the std_log.txt file.


Solution

  • The reason of shortening or technically trimming your input argument is that the bash variable is split at the & level. so all the rest of your sas url goes as "commands" or other "arguments". Apparently that is how azure parses it.

    eg:

    python3 test_input.py --blob_sas_url "somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1"
    >>> output:  somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1
    
    python3 test_input.py --blob_sas_url somepath/to/storage/account/file.txt?sv=2022-01-01&sr=b&sig=SOmethingwd21dd1
    >>> output:  
    [1] 1961
    [2] 1962
    [2]+  Done                    sr=b
    

    so you just need to quote your Azure variable in your step command like follows:

    python3 your_python_script.py --blob_sas_url "$(azml.sasURL)"