Search code examples
pythonsftppysftp

PySFTP and get_r using Python - "No such file or directory"


So I have a "simple" process that needs to go out and grab data from another server and then copy the directory (and all sub-directories) to my server

The code is as follows:

import pysftp


dbfs_path = '/dbfs/mnt/aaa/bbb/output/{}/'.format(dbutils.widgets.get("run_name"))
remote_path = '/mst_bbb/{}/output/{}/'.format(bucket,dbutils.widgets.get("run_name"))
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None   

srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)

srv.get_r(remote_path,dbfs_path)

It was working fine until I realized that sometimes I had to get the same directories more than once and would throw off an error that

the directory already exists

No problem, I thought and did the following:

import shutil
shutil.rmtree(dbfs_path)

And then re-ran the code

But now I get a much different error

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-16-9f782d79e03f> in <module>()
     12 
     13 srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
---> 14 srv.get_r(remote_path,dbfs_path)

/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get_r(self, remotedir, localdir, preserve_mtime)
    309             self.get(fname,
    310                      reparent(localdir, fname),
--> 311                      preserve_mtime=preserve_mtime)
    312 
    313     def getfo(self, remotepath, flo, callback=None):

/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get(self, remotepath, localpath, callback, preserve_mtime)
    247             sftpattrs = self._sftp.stat(remotepath)
    248 
--> 249         self._sftp.get(remotepath, localpath, callback=callback)
    250         if preserve_mtime:
    251             os.utime(localpath, (sftpattrs.st_atime, sftpattrs.st_mtime))

/databricks/python/local/lib/python2.7/site-packages/paramiko/sftp_client.pyc in get(self, remotepath, localpath, callback)
    767             Added the ``callback`` param
    768         """
--> 769         with open(localpath, 'wb') as fl:
    770             size = self.getfo(remotepath, fl, callback)
    771         s = os.stat(localpath)

IOError: [Errno 2] No such file or directory: u'/dbfs/aaa/bbb/output/run_job/./mst_bbb/pri1/output/run_job/date=2017-12-01/2017-12-01_output_0.csv.gz'

Any ideas what might be causing this problem? I can't figure it out

Thanks


Solution

  • I believe that the target directory of get_r (localdir argument) has to exists. pysftp won't create it for you.

    While your call to shutil.rmtree removes not only directory contents, but the directory itself too.

    Recreate the directory afterwards:

    shutil.rmtree(dbfs_path)  
    os.mkdir(dbfs_path)
    

    Though actually, I do not understand your original problem. I do not see why would you be getting the "the directory already exists" error. Maybe you should ask about that problem, rather than implementing an inefficient workaround.