Search code examples
python-3.xamazon-web-servicesamazon-s3joblibdoc2vec

FileNotFoundError: [WinError 2] The system cannot find the file specified while loading model from s3


I have recently saved a model into s3 using joblib

model_doc is the model object

import subprocess
import joblib

save_d2v_to_s3_current_doc2vec_model(model_doc,"doc2vec_model")

def save_d2v_to_s3_current_doc2vec_model(model,fname):
    model_name = fname
    joblib.dump(model,model_name)
    s3_base_path = 's3://sd-flikku/datalake/current_doc2vec_model'
    path = s3_base_path+'/'+model_name
    command = "aws s3 cp {} {}".format(model_name,path).split()
    print('saving...'+model_name)
    subprocess.call(command)

It was successful, but after that when i try to load the model back from s3 it gives me an error

model = load_d2v("doc2vec_model")

def load_d2v(fname):
    model_name = fname
    s3_base_path='s3://sd-flikku/datalake/current_doc2vec_model'
    path = s3_base_path+'/'+model_name  
    command = "aws s3 cp {} {}".format(path,model_name).split()
    print('loading...'+model_name)
    subprocess.call(command)
    model=joblib.load(model_name)
    return model

This is the error i get:

loading...doc2vec_model
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in load_d2v
  File "C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 339, in call
    with Popen(*popenargs, **kwargs) as p:
  File "C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1207, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

I don't even understand why it is saying File not found, this was the path i used to save the model but now i'm unable to get the model back from s3. Please help me!!


Solution

  • I suggest that rather than your generic print() lines, showing your intent, you should print the actual command you've composed, to verify that it makes sense upon observation.

    If it does, then also try that exact same aws ... command directly, at the command prompt where you had been launching your python code, to make sure it runs that way. If it doesn't, you may get a more clear error.

    Note that the error you're getting doesn't particularly look like it's coming from the aws command, of from the S3 service - which might talk about 'paths' or 'objects'. Rather, it's from the Python subprocess system & Popen' call. I think those are via your call tosubprocess.call(), but for some reason your line-of-code isn't shown. (How are you running the block of code with theload_d2v()`?)

    That suggests the file that's no found might be the aws command itself. Are you sure it's installed & runnable from the exact working-directory/environment that your Python is running in, and invoking via subprocess.call()?

    (BTW, if my previous answer got you over your sklearn.externals.joblib problem, it'd be good for you to mark the answer as accepted, to save other potential answerers from thinking that's still an unsolved question that's blocking you.)