Search code examples
pandasazureazure-devopsdata-storage

How to upload a file from my session into azure datastorage?


I dowload a datafile from my azure datastorage, preprocess it and again want to upload the processed file to the datastore as a final csv, how do I do it? I tried the appraoch below but it is giving me a directory error:

datastore = ws.get_default_datastore()
datastore_paths_train = [(datastore, 'X.csv')]
traindata = Dataset.Tabular.from_delimited_files(path=datastore_paths_train)
train = traindata.to_pandas_dataframe()


#preprocessing the data 
X, y = preprocess_data(train)


#splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

#uploading data to datastore
print('Uploading data to datastore')
outputs_folder = './Scaling_data'
os.makedirs(outputs_folder, exist_ok=True)
datastore.upload(X_train, outputs_folder)

How do I get my 'X_train' a directory, I tried making it a Path object but that also didn't work. I might be wrong here, if there are any other ways to upload a csv to datastore I would be happy to learn.


Solution

  • The below datastore.upload method shows you need to specify the source files directory to upload. See here for more details.

    upload(src_dir, target_path=None, overwrite=False, show_progress=True)

    So you need to save dataframe X_train to a local file first. See below example:

    outputs_folder = "./Scaling_data"
    
    # create local directory if not exist
    if not os.path.exists(outputs_folder):
        os.mkdir(outputs_folder)
    
    local_path = './Scaling_data/prepared.csv'
    
    # save dataframe X_train to local file './Scaling_data/prepared.csv'
    X_train.to_csv(local_path)
    
    # upload the local file from src_dir to the target_path in datastore
    datastore.upload(src_dir=outputs_folder, target_path=outputs_folder)
    

    You can also check out this example.